DOAD: Decoupled One Stage Action Detection Network

04/01/2023
by   Shuning Chang, et al.
0

Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding. Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition. However, such two-stage methods are generally with low efficiency. We observe that directly unifying detection and action recognition normally suffers from (i) inferior learning due to different desired properties of context representation for detection and action recognition; (ii) optimization difficulty with insufficient training data. In this work, we present a decoupled one-stage network dubbed DOAD, to mitigate above issues and improve the efficiency for spatio-temporal action detection. To achieve it, we decouple detection and action recognition into two branches. Specifically, one branch focuses on detection representation for actor detection, and the other one for action recognition. For the action branch, we design a transformer-based module (TransPC) to model pairwise relationships between people and context. Different from commonly used vector-based dot product in self-attention, it is built upon a novel matrix-based key and value for Hadamard attention to model person-context information. It not only exploits relationships between person pairs but also takes into account context and relative position information. The results on AVA and UCF101-24 datasets show that our method is competitive with two-stage state-of-the-art methods with significant efficiency improvement.

READ FULL TEXT

page 1

page 2

page 4

page 8

research
12/02/2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

First-person action recognition is a challenging task in video understan...
research
02/08/2020

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

Egocentric video recognition is a natural testbed for diverse interactio...
research
09/28/2022

RADACS: Towards Higher-Order Reasoning using Action Recognition in Autonomous Vehicles

When applied to autonomous vehicle settings, action recognition can help...
research
01/14/2020

Actions as Moving Points

The existing action tubelet detectors mainly depend on heuristic anchor ...
research
06/28/2021

Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection

With rapidly evolving internet technologies and emerging tools, sports r...
research
04/25/2018

Actor and Observer: Joint Modeling of First and Third-Person Videos

Several theories in cognitive neuroscience suggest that when people inte...
research
06/06/2018

Action4D: Real-time Action Recognition in the Crowd and Clutter

Recognizing every person's action in a crowded and cluttered environment...

Please sign up or login with your details

Forgot password? Click here to reset