CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization

08/19/2020
by   Yuxi Li, et al.
0

Most current pipelines for spatio-temporal action localization connect frame-wise or clip-wise detection results to generate action proposals, where only local information is exploited and the efficiency is hindered by dense per-frame localization. In this paper, we propose Coarse-to-Fine Action Detector (CFAD),an original end-to-end trainable framework for efficient spatio-temporal action localization. The CFAD introduces a new paradigm that first estimates coarse spatio-temporal action tubes from video streams, and then refines the tubes' location based on key timestamps. This concept is implemented by two key components, the Coarse and Refine Modules in our framework. The parameterized modeling of long temporal information in the Coarse Module helps obtain accurate initial tube estimation, while the Refine Module selectively adjusts the tube location under the guidance of key timestamps. Against other methods, theproposed CFAD achieves competitive results on action detection benchmarks of UCF101-24, UCFSports and JHMDB-21 with inference speed that is 3.3x faster than the nearest competitors.

READ FULL TEXT
research
06/29/2018

YH Technologies at ActivityNet Challenge 2018

This notebook paper presents an overview and comparative analysis of our...
research
03/01/2021

Coarse-Fine Networks for Temporal Activity Detection in Videos

In this paper, we introduce 'Coarse-Fine Networks', a two-stream archite...
research
07/25/2019

Submission to ActivityNet Challenge 2019: Task B Spatio-temporal Action Localization

This technical report present an overview of our system proposed for the...
research
04/19/2019

STEP: Spatio-Temporal Progressive Learning for Video Action Detection

In this paper, we propose Spatio-TEmporal Progressive (STEP) action dete...
research
05/28/2019

Improving Action Localization by Progressive Cross-stream Cooperation

Spatio-temporal action localization consists of three levels of tasks: s...
research
04/09/2022

E^2TAD: An Energy-Efficient Tracking-based Action Detector

Video action detection (spatio-temporal action localization) is usually ...
research
06/15/2023

Single-Stage Visual Query Localization in Egocentric Videos

Visual Query Localization on long-form egocentric videos requires spatio...

Please sign up or login with your details

Forgot password? Click here to reset