Long Short-Term Relation Networks for Video Action Detection

03/31/2020
by   Dong Li, et al.
0

It has been well recognized that modeling human-object or object-object relations would be helpful for detection task. Nevertheless, the problem is not trivial especially when exploring the interactions between human actor, object and scene (collectively as human-context) to boost video action detectors. The difficulty originates from the aspect that reliable relations in a video should depend on not only short-term human-context relation in the present clip but also the temporal dynamics distilled over a long-range span of the video. This motivates us to capture both short-term and long-term relations in a video. In this paper, we present a new Long Short-Term Relation Networks, dubbed as LSTR, that novelly aggregates and propagates relation to augment features for video action detection. Technically, Region Proposal Networks (RPN) is remoulded to first produce 3D bounding boxes, i.e., tubelets, in each video clip. LSTR then models short-term human-context interactions within each clip through spatio-temporal attention mechanism and reasons long-term temporal dynamics across video clips via Graph Convolutional Networks (GCN) in a cascaded manner. Extensive experiments are conducted on four benchmark datasets, and superior results are reported when comparing to state-of-the-art methods.

READ FULL TEXT

page 1

page 3

page 7

research
08/26/2019

Relation Distillation Networks for Video Object Detection

It has been well recognized that modeling object-to-object relations wou...
research
08/26/2021

Identity-aware Graph Memory Network for Action Detection

Action detection plays an important role in high-level video understandi...
research
08/30/2020

Finding Action Tubes with a Sparse-to-Dense Framework

The task of spatial-temporal action detection has attracted increasing a...
research
07/15/2021

What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection

Identifying relations between objects is central to understanding the sc...
research
08/05/2020

Learning Long-term Visual Dynamics with Region Proposal Interaction Networks

Learning long-term dynamics models is the key to understanding physical ...
research
09/14/2020

Leveraging Multi-level Dependency of Relational Sequences for Social Spammer Detection

Much recent research has shed light on the development of the relation-d...
research
06/05/2020

Egocentric Object Manipulation Graphs

We introduce Egocentric Object Manipulation Graphs (Ego-OMG) - a novel r...

Please sign up or login with your details

Forgot password? Click here to reset