Human-Object Interaction Prediction in Videos through Gaze Following

06/06/2023
by   Zhifan Ni, et al.
0

Understanding the human-object interactions (HOIs) from a video is essential to fully comprehend a visual scene. This line of research has been addressed by detecting HOIs from images and lately from videos. However, the video-based HOI anticipation task in the third-person view remains understudied. In this paper, we design a framework to detect current HOIs and anticipate future HOIs in videos. We propose to leverage human gaze information since people often fixate on an object before interacting with it. These gaze features together with the scene contexts and the visual appearances of human-object pairs are fused through a spatio-temporal transformer. To evaluate the model in the HOI anticipation task in a multi-person scenario, we propose a set of person-wise multi-label metrics. Our model is trained and validated on the VidHOI dataset, which contains videos capturing daily life and is currently the largest video HOI dataset. Experimental results in the HOI detection task show that our approach improves the baseline by a great margin of 36.3 we conduct an extensive ablation study to demonstrate the effectiveness of our modifications and extensions to the spatio-temporal transformer. Our code is publicly available on https://github.com/nizhf/hoi-prediction-gaze-transformer.

READ FULL TEXT

page 2

page 4

page 10

research
09/22/2022

MGTR: End-to-End Mutual Gaze Detection with Transformer

People's looking at each other or mutual gaze is ubiquitous in our daily...
research
01/04/2018

Object Referring in Videos with Language and Human Gaze

We investigate the problem of object referring (OR) i.e. to localize a t...
research
09/04/2019

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning

This paper addresses a new problem of understanding human gaze communica...
research
08/08/2022

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation

In this paper, we present the first transformer-based model to address t...
research
10/16/2021

MAAD: A Model and Dataset for "Attended Awareness" in Driving

We propose a computational model to estimate a person's attended awarene...
research
07/08/2021

4D Attention: Comprehensive Framework for Spatio-Temporal Gaze Mapping

This study presents a framework for capturing human attention in the spa...
research
07/24/2022

Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism

This report describes our submission called "TarHeels" for the Ego4D: Ob...

Please sign up or login with your details

Forgot password? Click here to reset