SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

09/02/2021
by   Nada Osman, et al.
0

Action anticipation in egocentric videos is a difficult task due to the inherently multi-modal nature of human actions. Additionally, some actions happen faster or slower than others depending on the actor or surrounding context which could vary each time and lead to different predictions. Based on this idea, we build upon RULSTM architecture, which is specifically designed for anticipating human actions, and propose a novel attention-based technique to evaluate, simultaneously, slow and fast features extracted from three different modalities, namely RGB, optical flow, and extracted objects. Two branches process information at different time scales, i.e., frame-rates, and several fusion schemes are considered to improve prediction accuracy. We perform extensive experiments on EpicKitchens-55 and EGTEA Gaze+ datasets, and demonstrate that our technique systematically improves the results of RULSTM architecture for Top-5 accuracy metric at different anticipation times.

READ FULL TEXT

page 4

page 7

research
05/04/2020

Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video

In this paper, we tackle the problem of egocentric action anticipation, ...
research
05/22/2019

What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

Egocentric action anticipation consists in understanding which objects t...
research
07/18/2021

Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos

Anticipating human actions is an important task that needs to be address...
research
05/31/2023

A Multi-Modal Transformer Network for Action Detection

This paper proposes a novel multi-modal transformer network for detectin...
research
07/28/2021

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

The dominant paradigm in spatiotemporal action detection is to classify ...
research
12/30/2018

Actor Conditioned Attention Maps for Video Action Detection

Interactions with surrounding objects and people contain important infor...
research
12/18/2021

Adversarial Memory Networks for Action Prediction

Action prediction aims to infer the forthcoming human action with partia...

Please sign up or login with your details

Forgot password? Click here to reset