With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

11/01/2021
by   Evangelos Kazakos, et al.
2

In egocentric videos, actions occur in quick succession. We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance. To incorporate the temporal context, we propose a transformer-based multimodal model that ingests video and audio as input modalities, with an explicit language model providing action sequence context to enhance the predictions. We test our approach on EPIC-KITCHENS and EGTEA datasets reporting state-of-the-art performance. Our ablations showcase the advantage of utilising temporal context as well as incorporating audio input modality and language model to rescore predictions. Code and models at: https://github.com/ekazakos/MTCN.

READ FULL TEXT

page 1

page 2

page 4

page 10

page 23

research
09/11/2022

MAiVAR: Multimodal Audio-Image and Video Action Recognizer

Currently, action recognition is predominately performed on video data a...
research
02/10/2022

OWL (Observe, Watch, Listen): Localizing Actions in Egocentric Video via Audiovisual Temporal Context

Temporal action localization (TAL) is an important task extensively expl...
research
06/27/2021

Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization

State of the art architectures for untrimmed video Temporal Action Local...
research
07/14/2023

Multimodal Distillation for Egocentric Action Recognition

The focal point of egocentric video understanding is modelling hand-obje...
research
03/26/2022

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Action recognition models have shown a promising capability to classify ...
research
08/14/2023

AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

We propose a method named AudioFormer,which learns audio feature represe...
research
05/22/2019

What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

Egocentric action anticipation consists in understanding which objects t...

Please sign up or login with your details

Forgot password? Click here to reset