Actor Conditioned Attention Maps for Video Action Detection

12/30/2018
by   Oytun Ulutan, et al.
18

Interactions with surrounding objects and people contain important information towards understanding human actions. In order to model such interactions explicitly, we propose to generate attention maps that rank each spatio-temporal region's importance to a detected actor. We refer to these as Actor-Conditioned Attention Maps (ACAM), and these maps serve as weights to the features extracted from the whole scene. These resulting actor-conditioned features help focus the learned model on regions that are important/relevant to the conditioned actor. Another novelty of our approach is in the use of pre-trained object detectors, instead of region proposals, that generalize better to videos from different sources. Detailed experimental results on the AVA 2.1 datasets demonstrate the importance of interactions, with a performance improvement of 5 mAP with respect to state of the art published results.

READ FULL TEXT

page 3

page 6

page 7

page 8

research
07/28/2018

Actor-Centric Relation Network

Current state-of-the-art approaches for spatio-temporal action localizat...
research
06/29/2021

Spatio-Temporal Context for Action Detection

Research in action detection has grown in the recentyears, as it plays a...
research
07/28/2021

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

The dominant paradigm in spatiotemporal action detection is to classify ...
research
04/24/2023

MRSN: Multi-Relation Support Network for Video Action Detection

Action detection is a challenging video understanding task, requiring mo...
research
11/22/2020

We don't Need Thousand Proposals Single Shot Actor-Action Detection in Videos

We propose SSA2D, a simple yet effective end-to-end deep network for act...
research
07/31/2023

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

Can we better anticipate an actor's future actions (e.g. mix eggs) by kn...
research
09/02/2021

SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

Action anticipation in egocentric videos is a difficult task due to the ...

Please sign up or login with your details

Forgot password? Click here to reset