StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation

04/08/2023
by   Francesco Ragusa, et al.
4

Anticipation problem has been studied considering different aspects such as predicting humans' locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-to-end architecture named StillFast. Our approach simultaneously processes a still image and a video detecting and localizing next-active objects, predicting the verb which describes the future interaction and determining when the interaction will start. Experiments on the large-scale egocentric dataset EGO4D show that our method outperformed state-of-the-art approaches on the considered task. Our method is ranked first in the public leaderboard of the EGO4D short term object interaction anticipation challenge 2022. Please see the project web page for code and additional details: https://iplab.dmi.unict.it/stillfast/.

READ FULL TEXT

page 2

page 3

page 5

page 8

research
04/12/2021

End-to-End Mandarin Tone Classification with Short Term Context Information

In this paper, we propose an end-to-end Mandarin tone classification met...
research
03/07/2022

Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios

Human behavior forecasting during human-human interactions is of utmost ...
research
11/20/2016

Fast Video Classification via Adaptive Cascading of Deep Models

Recent advances have enabled "oracle" classifiers that can classify acro...
research
09/12/2022

Graphing the Future: Activity and Next Active Object Prediction using Graph-based Activity Representations

We present a novel approach for the visual prediction of human-object in...
research
04/30/2021

RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection

Human-Object Interaction (HOI) detection devotes to learn how humans int...
research
06/29/2021

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning

Contrary to the vast literature in modeling, perceiving, and understandi...
research
12/15/2021

Predicting Media Memorability: Comparing Visual, Textual and Auditory Features

This paper describes our approach to the Predicting Media Memorability t...

Please sign up or login with your details

Forgot password? Click here to reset