Gaussian Temporal Awareness Networks for Action Localization

09/09/2019
by   Fuchen Long, et al.
0

Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Nevertheless, the results can suffer from robustness problem due to the design of predetermined temporal scales, which overlooks the temporal structure of an action and limits the utility on detecting actions with complex variations. In this paper, we propose to address the problem by introducing Gaussian kernels to dynamically optimize temporal scale of each action proposal. Specifically, we present Gaussian Temporal Awareness Networks (GTAN) --- a new architecture that novelly integrates the exploitation of temporal structure into an one-stage action localization framework. Technically, GTAN models the temporal structure through learning a set of Gaussian kernels, each for a cell in the feature maps. Each Gaussian kernel corresponds to a particular interval of an action proposal and a mixture of Gaussian kernels could further characterize action proposals with various length. Moreover, the values in each Gaussian curve reflect the contextual contributions to the localization of an action proposal. Extensive experiments are conducted on both THUMOS14 and ActivityNet v1.3 datasets, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, GTAN achieves 1.9 of the two datasets.

READ FULL TEXT

page 3

page 7

research
07/21/2017

Temporal Convolution Based Action Proposal: Submission to ActivityNet 2017

In this notebook paper, we describe our approach in the submission to th...
research
04/20/2018

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

We propose TAL-Net, an improved approach to temporal action localization...
research
04/16/2019

Decoupling Localization and Classification in Single Shot Temporal Action Detection

Video temporal action detection aims to temporally localize and recogniz...
research
08/31/2020

Learning to Localize Actions from Moments

With the knowledge of action moments (i.e., trimmed video clips that eac...
research
07/29/2019

Multi-Granularity Fusion Network for Proposal and Activity Localization: Submission to ActivityNet Challenge 2019 Task 1 and Task 2

This technical report presents an overview of our solution used in the s...
research
07/03/2019

Deformable Tube Network for Action Detection in Videos

We address the problem of spatio-temporal action detection in videos. Ex...
research
01/11/2022

Representing Videos as Discriminative Sub-graphs for Action Recognition

Human actions are typically of combinatorial structures or patterns, i.e...

Please sign up or login with your details

Forgot password? Click here to reset