Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

06/14/2020
by   Junting Pan, et al.
7

Localizing persons and recognizing their actions from videos is a challenging task towards high-level video understanding. Recent advances have been achieved by modeling either 'actor-actor' or 'actor-context' relations. However, such direct first-order relations are not sufficient for localizing actions in complicated scenes. Some actors might be indirectly related via objects or background context in the scene. Such indirect relations are crucial for determining the action labels but are mostly ignored by existing work. In this paper, we propose to explicitly model the Actor-Context-Actor Relation, which can capture indirect high-order supportive information for effectively reasoning actors' actions in complex scenes. To this end, we design an Actor-Context-Actor Relation Network (ACAR-Net) which builds upon a novel High-order Relation Reasoning Operator to model indirect relations for spatio-temporal action localization. Moreover, to allow utilizing more temporal contexts, we extend our framework with an Actor-Context Feature Bank for reasoning long-range high-order relations. Extensive experiments on AVA dataset validate the effectiveness of our ACAR-Net. Ablation studies show the advantages of modeling high-order relations over existing first-order relation reasoning methods. The proposed ACAR-Net is also the core module of our 1st place solution in AVA-Kinetics Crossover Challenge 2020. Training code and models will be available at https://github.com/Siyu-C/ACAR-Net.

READ FULL TEXT

page 2

page 5

page 13

page 14

research
03/28/2023

CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection

The relation modeling between actors and scene context advances video ac...
research
06/16/2020

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

This technical report introduces our winning solution to the spatio-temp...
research
08/18/2021

Target Adaptive Context Aggregation for Video Scene Graph Generation

This paper deals with a challenging task of video scene graph generation...
research
04/24/2023

MRSN: Multi-Relation Support Network for Video Action Detection

Action detection is a challenging video understanding task, requiring mo...
research
07/28/2018

Actor-Centric Relation Network

Current state-of-the-art approaches for spatio-temporal action localizat...
research
06/15/2021

Relation Modeling in Spatio-Temporal Action Localization

This paper presents our solution to the AVA-Kinetics Crossover Challenge...
research
07/27/2021

Enriching Local and Global Contexts for Temporal Action Localization

Effectively tackling the problem of temporal action localization (TAL) n...

Please sign up or login with your details

Forgot password? Click here to reset