SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection

06/29/2021
by   Ranyu Ning, et al.
0

Temporal action detection (TAD) is a challenging task which aims to temporally localize and recognize the human action in untrimmed videos. Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors, where the location and scale for action instances are set by designers. Obviously, such an anchor-based TAD method limits its generalization capability and will lead to performance degradation when videos contain rich action variation. In this study, we explore to remove the requirement of pre-defined anchors for TAD methods. A novel TAD model termed as Selective Receptive Field Network (SRF-Net) is developed, in which the location offsets and classification scores at each temporal location can be directly estimated in the feature map and SRF-Net is trained in an end-to-end manner. Innovatively, a building block called Selective Receptive Field Convolution (SRFC) is dedicatedly designed which is able to adaptively adjust its receptive field size according to multiple scales of input information at each temporal location in the feature map. Extensive experiments are conducted on the THUMOS14 dataset, and superior results are reported comparing to state-of-the-art TAD approaches.

READ FULL TEXT
research
10/18/2019

AFO-TAD: Anchor-free One-Stage Detector for Temporal Action Detection

Temporal action detection is a fundamental yet challenging task in video...
research
08/02/2019

Scale Matters: Temporal Scale Aggregation Network for Precise Action Localization in Untrimmed Videos

Temporal action localization is a recently-emerging task, aiming to loca...
research
08/22/2020

Revisiting Anchor Mechanisms for Temporal Action Localization

Most of the current action localization methods follow an anchor-based p...
research
04/16/2019

Decoupling Localization and Classification in Single Shot Temporal Action Detection

Video temporal action detection aims to temporally localize and recogniz...
research
04/16/2019

Response of Selective Attention in Middle Temporal Area

The primary visual cortex processes a large amount of visual information...
research
06/03/2019

RF-Net: An End-to-End Image Matching Network based on Receptive Field

This paper proposes a new end-to-end trainable matching network based on...
research
11/16/2020

LAP-Net: Adaptive Features Sampling via Learning Action Progression for Online Action Detection

Online action detection is a task with the aim of identifying ongoing ac...

Please sign up or login with your details

Forgot password? Click here to reset