Deep Learning-based Action Detection in Untrimmed Videos: A Survey

09/30/2021
by   Elahe Vahdani, et al.
0

Understanding human behavior and activity facilitates advancement of numerous real-world applications, and is critical for video analysis. Despite the progress of action recognition algorithms in trimmed videos, the majority of real-world videos are lengthy and untrimmed with sparse segments of interest. The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions and classify the action categories. Temporal activity detection task has been investigated in full and limited supervision settings depending on the availability of action annotations. This paper provides an extensive overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos with different supervision levels including fully-supervised, weakly-supervised, unsupervised, self-supervised, and semi-supervised. In addition, this paper also reviews advances in spatio-temporal action detection where actions are localized in both temporal and spatial dimensions. Moreover, the commonly used action detection benchmark datasets and evaluation metrics are described, and the performance of the state-of-the-art methods are compared. Finally, real-world applications of temporal action detection in untrimmed videos and a set of future directions are discussed.

READ FULL TEXT

page 1

page 2

page 8

research
08/03/2023

A Survey on Deep Learning-based Spatio-temporal Action Detection

Spatio-temporal action detection (STAD) aims to classify the actions pre...
research
04/16/2019

Weakly Supervised Gaussian Networks for Action Detection

Detecting temporal extents of human actions in videos is a challenging c...
research
09/16/2021

A Survey on Temporal Sentence Grounding in Videos

Temporal sentence grounding in videos(TSGV), which aims to localize one ...
research
12/14/2016

Temporal-Needle: A view and appearance invariant video descriptor

The ability to detect similar actions across videos can be very useful f...
research
12/03/2019

A Context-Aware Loss Function for Action Spotting in Soccer Videos

Action spotting is an important element of general activity understandin...
research
09/19/2018

Detect, anticipate and generate: Semi-supervised recurrent latent variable models for human activity modeling

Successful Human-Robot collaboration requires a predictive model of huma...
research
09/21/2022

An Overview of Violence Detection Techniques: Current Challenges and Future Directions

The Big Video Data generated in today's smart cities has raised concerns...

Please sign up or login with your details

Forgot password? Click here to reset