Unified Recurrence Modeling for Video Action Anticipation

06/02/2022
by   Tsung-Ming Tai, et al.
9

Forecasting future events based on evidence of current conditions is an innate skill of human beings, and key for predicting the outcome of any decision making. In artificial vision for example, we would like to predict the next human action before it happens, without observing the future video frames associated to it. Computer vision models for action anticipation are expected to collect the subtle evidence in the preamble of the target actions. In prior studies recurrence modeling often leads to better performance, the strong temporal inference is assumed to be a key element for reasonable prediction. To this end, we propose a unified recurrence modeling for video action anticipation via message passing framework. The information flow in space-time can be described by the interaction between vertices and edges, and the changes of vertices for each incoming frame reflects the underlying dynamics. Our model leverages self-attention as the building blocks for each of the message passing functions. In addition, we introduce different edge learning strategies that can be end-to-end optimized to gain better flexibility for the connectivity between vertices. Our experimental results demonstrate that our proposed method outperforms previous works on the large-scale EPIC-Kitchen dataset.

READ FULL TEXT
research
06/22/2022

NVIDIA-UNIBZ Submission for EPIC-KITCHENS-100 Action Anticipation Challenge 2022

In this report, we describe the technical details of our submission for ...
research
05/09/2021

Dispatcher: A Message-Passing Approach To Language Modelling

This paper proposes a message-passing mechanism to address language mode...
research
12/17/2022

Inductive Attention for Video Action Anticipation

Anticipating future actions based on video observations is an important ...
research
04/03/2020

LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention

Existing LiDAR-based 3D object detectors usually focus on the single-fra...
research
12/14/2022

HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

We propose a method that leverages graph neural networks, multi-level me...
research
02/14/2021

Relation-aware Graph Attention Model With Adaptive Self-adversarial Training

This paper describes an end-to-end solution for the relationship predict...
research
04/16/2020

Asynchronous Interaction Aggregation for Action Detection

Understanding interaction is an essential part of video action detection...

Please sign up or login with your details

Forgot password? Click here to reset