An Empirical Study of End-to-End Temporal Action Detection

04/06/2022
by   Xiaolong Liu, et al.
0

Temporal action detection (TAD) is an important yet challenging task in video understanding. It aims to simultaneously predict the semantic label and the temporal interval of every action instance in an untrimmed video. Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD. The effect of end-to-end learning is not systematically evaluated. Besides, there lacks an in-depth study on the efficiency-accuracy trade-off in end-to-end TAD. In this paper, we present an empirical study of end-to-end temporal action detection. We validate the advantage of end-to-end learning over head-only learning and observe up to 11% performance improvement. Besides, we study the effects of multiple design choices that affect the TAD performance and speed, including detection head, video encoder, and resolution of input videos. Based on the findings, we build a mid-resolution baseline detector, which achieves the state-of-the-art performance of end-to-end methods while running more than 4× faster. We hope that this paper can serve as a guide for end-to-end learning and inspire future research in this field. Code and models are available at <https://github.com/xlliu7/E2E-TAD>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2021

End-to-end Temporal Action Detection with Transformer

Temporal action detection (TAD) aims to determine the semantic label and...
research
05/14/2022

ETAD: A Unified Framework for Efficient Temporal Action Detection

Untrimmed video understanding such as temporal action detection (TAD) of...
research
08/31/2022

An Empirical Study and Analysis of Learning Generalizable Manipulation Skill in the SAPIEN Simulator

This paper provides a brief overview of our submission to the no interac...
research
05/05/2022

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

Temporal action detection (TAD) is extensively studied in the video unde...
research
10/07/2021

MGPSN: Motion-Guided Pseudo Siamese Network for Indoor Video Head Detection

Head detection in real-world videos is an important research topic in co...
research
07/13/2023

Automated Deception Detection from Videos: Using End-to-End Learning Based High-Level Features and Classification Approaches

Deception detection is an interdisciplinary field attracting researchers...
research
04/16/2020

Asynchronous Interaction Aggregation for Action Detection

Understanding interaction is an essential part of video action detection...

Please sign up or login with your details

Forgot password? Click here to reset