Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos

07/18/2021
by   Olga Zatsarynna, et al.
0

Anticipating human actions is an important task that needs to be addressed for the development of reliable intelligent agents, such as self-driving cars or robot assistants. While the ability to make future predictions with high accuracy is crucial for designing the anticipation approaches, the speed at which the inference is performed is not less important. Methods that are accurate but not sufficiently fast would introduce a high latency into the decision process. Thus, this will increase the reaction time of the underlying system. This poses a problem for domains such as autonomous driving, where the reaction time is crucial. In this work, we propose a simple and effective multi-modal architecture based on temporal convolutions. Our approach stacks a hierarchy of temporal convolutional layers and does not rely on recurrent layers to ensure a fast prediction. We further introduce a multi-modal fusion mechanism that captures the pairwise interactions between RGB, flow, and object modalities. Results on two large-scale datasets of egocentric videos, EPIC-Kitchens-55 and EPIC-Kitchens-100, show that our approach achieves comparable performance to the state-of-the-art approaches while being significantly faster.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2022

Multi-modal Fusion Technology based on Vehicle Information: A Survey

Multi-modal fusion is a basic task of autonomous driving system percepti...
research
06/16/2021

CMF: Cascaded Multi-model Fusion for Referring Image Segmentation

In this work, we address the task of referring image segmentation (RIS),...
research
09/13/2022

Towards Efficient Architecture and Algorithms for Sensor Fusion

The safety of an automated vehicle hinges crucially upon the accuracy of...
research
09/02/2021

SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

Action anticipation in egocentric videos is a difficult task due to the ...
research
07/02/2022

ReCoAt: A Deep Learning-based Framework for Multi-Modal Motion Prediction in Autonomous Driving Application

This paper proposes a novel deep learning framework for multi-modal moti...
research
09/17/2022

RGB-Event Fusion for Moving Object Detection in Autonomous Driving

Moving Object Detection (MOD) is a critical vision task for successfully...
research
10/01/2018

Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

For autonomous agents to successfully operate in the real world, the abi...

Please sign up or login with your details

Forgot password? Click here to reset