Exploring Frame Segmentation Networks for Temporal Action Localization

02/14/2019
by   Ke Yang, et al.
0

Temporal action localization is an important task of computer vision. Though many methods have been proposed, it still remains an open question how to predict the temporal location of action segments precisely. Most state-of-the-art works train action classifiers on video segments pre-determined by action proposal. However, recent work found that a desirable model should move beyond segment-level and make dense predictions at a fine granularity in time to determine precise temporal boundaries. In this paper, we propose a Frame Segmentation Network (FSN) that places a temporal CNN on top of the 2D spatial CNNs. Spatial CNNs are responsible for abstracting semantics in spatial dimension while temporal CNN is responsible for introducing temporal context information and performing dense predictions. The proposed FSN can make dense predictions at frame-level for a video clip using both spatial and temporal context information. FSN is trained in an end-to-end manner, so the model can be optimized in spatial and temporal domain jointly. We also adapt FSN to use it in weakly supervised scenario (WFSN), where only video level labels are provided when training. Experiment results on public dataset show that FSN achieves superior performance in both frame-level action localization and temporal action localization.

READ FULL TEXT
research
08/10/2017

Exploring Temporal Preservation Networks for Precise Temporal Action Localization

Temporal action localization is an important task of computer vision. Th...
research
03/04/2017

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

Temporal action localization is an important yet challenging problem. Gi...
research
12/21/2021

ACGNet: Action Complement Graph Network for Weakly-supervised Temporal Action Localization

Weakly-supervised temporal action localization (WTAL) in untrimmed video...
research
03/01/2019

Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos

Previous spatial-temporal action localization methods commonly follow th...
research
11/28/2018

Multi-granularity Generator for Temporal Action Proposal

Temporal action proposal generation is an important task, aiming to loca...
research
11/15/2019

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

Spatiotemporal action localization requires incorporation of two sources...
research
11/23/2020

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

Understanding videos is challenging in computer vision. In particular, t...

Please sign up or login with your details

Forgot password? Click here to reset