Graph Convolutional Module for Temporal Action Localization in Videos

12/01/2021
by   Runhao Zeng, et al.
0

Temporal action localization has long been researched in computer vision. Existing state-of-the-art action localization methods divide each video into multiple action units (i.e., proposals in two-stage methods and segments in one-stage methods) and then perform action recognition/regression on each of them individually, without explicitly exploiting their relations during learning. In this paper, we claim that the relations between action units play an important role in action localization, and a more powerful action detector should not only capture the local content of each action unit but also allow a wider field of view on the context related to it. To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms. To be specific, we first construct a graph, where each action unit is represented as a node and their relations between two action units as an edge. Here, we use two types of relations, one for capturing the temporal connections between different action units, and the other one for characterizing their semantic relationship. Particularly for the temporal connections in two-stage methods, we further explore two different kinds of edges, one connecting the overlapping action units and the other one connecting surrounding but disjointed units. Upon the graph we built, we then apply graph convolutional networks (GCNs) to model the relations among different action units, which is able to learn more informative representations to enhance action localization. Experimental results show that our GCM consistently improves the performance of existing action localization methods, including two-stage methods (e.g., CBR and R-C3D) and one-stage methods (e.g., D-SSAD), verifying the generality and effectiveness of our GCM.

READ FULL TEXT

page 2

page 4

page 9

page 10

page 11

page 12

research
09/07/2019

Graph Convolutional Networks for Temporal Action Localization

Most state-of-the-art action localization systems process each action pr...
research
03/17/2017

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

Temporal Action Proposal (TAP) generation is an important problem, as fa...
research
02/18/2020

Constraining Temporal Relationship for Action Localization

Recently, temporal action localization (TAL), i.e., finding specific act...
research
07/03/2019

Deformable Tube Network for Action Detection in Videos

We address the problem of spatio-temporal action detection in videos. Ex...
research
03/09/2020

Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid Network

Accurate temporal action proposals play an important role in detecting a...
research
11/26/2018

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation

We propose novel Stacked Spatio-Temporal Graph Convolutional Networks (S...
research
08/07/2021

Temporal Action Localization Using Gated Recurrent Units

Temporal Action Localization (TAL) task in which the aim is to predict t...

Please sign up or login with your details

Forgot password? Click here to reset