elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Human-object interaction prediction in videos through gaze following

Ni, Zhifan and Valls Mascaró, Esteve and Ahn, Hyemin and Lee, Dongheui (2023) Human-object interaction prediction in videos through gaze following. Computer Vision and Image Understanding, 233, p. 103741. Elsevier. doi: 10.1016/j.cviu.2023.103741. ISSN 1077-3142.

[img] PDF - Preprint version (submitted draft)
7MB

Official URL: https://www.sciencedirect.com/science/article/abs/pii/S1077314223001212

Abstract

Understanding the human-object interactions (HOIs) from a video is essential to fully comprehend a visual scene. This line of research has been addressed by detecting HOIs from images and lately from videos. However, the video-based HOI anticipation task in the third-person view remains understudied. In this paper, we design a framework to detect current HOIs and anticipate future HOIs in videos. We propose to leverage human gaze information since people often fixate on an object before interacting with it. These gaze features together with the scene contexts and the visual appearances of human-object pairs are fused through a spatio-temporal transformer. To evaluate the model in the HOI anticipation task in a multi-person scenario, we propose a set of person-wise multi-label metrics. Our model is trained and validated on the VidHOI dataset, which contains videos capturing daily life and is currently the largest video HOI dataset. Experimental results in the HOI detection task show that our approach improves the baseline by a great margin of 36.3% relatively. Moreover, we conduct an extensive ablation study to demonstrate the effectiveness of our modifications and extensions to the spatio-temporal transformer. Our code is publicly available on .

Item URL in elib:https://elib.dlr.de/197480/
Document Type:Article
Title:Human-object interaction prediction in videos through gaze following
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Ni, ZhifanTUMhttps://orcid.org/0000-0002-1005-7524UNSPECIFIED
Valls Mascaró, EsteveTU Wienhttps://orcid.org/0000-0003-4195-8672UNSPECIFIED
Ahn, HyeminUNSPECIFIEDhttps://orcid.org/0000-0001-8081-6023UNSPECIFIED
Lee, DongheuiUNSPECIFIEDhttps://orcid.org/0000-0003-1897-7664UNSPECIFIED
Date:29 May 2023
Journal or Publication Title:Computer Vision and Image Understanding
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:Yes
Volume:233
DOI:10.1016/j.cviu.2023.103741
Page Range:p. 103741
Publisher:Elsevier
ISSN:1077-3142
Status:Published
Keywords:Human–object interaction
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Robotics
DLR - Research area:Raumfahrt
DLR - Program:R RO - Robotics
DLR - Research theme (Project):R - Basic Technologies [RO]
Location: Oberpfaffenhofen
Institutes and Institutions:Institute of Robotics and Mechatronics (since 2013)
Deposited By: Strobl, Dr. Klaus H.
Deposited On:22 Sep 2023 12:54
Last Modified:25 Sep 2023 10:28

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.