Boundary-Denoising for Video Activity Localization

04/06/2023
by   Mengmeng Xu, et al.
0

Video activity localization aims at understanding the semantic content in long untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action detection, etc. Unfortunately, learning the exact boundary location of activities is highly challenging because temporal activities are continuous in time, and there are often no clear-cut transitions between actions. Moreover, the definition of the start and end of events is subjective, which may confuse the model. To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective. Specifically, we propose an encoder-decoder model named DenoiseLoc. During training, a set of action spans is randomly generated from the ground truth with a controlled noise scale. Then we attempt to reverse this process by boundary denoising, allowing the localizer to predict activities with precise boundaries and resulting in faster convergence speed. Experiments show that DenoiseLoc advances observe a gain of +12.36 mAP@0.5 on THUMOS'14 dataset over the baseline. Moreover, DenoiseLoc achieves state-of-the-art performance on TACoS and MAD datasets, but with much fewer predictions compared to other current methods.

READ FULL TEXT
research
01/03/2021

A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization

Weakly supervised temporal action localization is a challenging vision t...
research
11/21/2020

Boundary-sensitive Pre-training for Temporal Localization in Videos

Many video analysis tasks require temporal localization thus detection o...
research
09/11/2023

Temporal Action Localization with Enhanced Instant Discriminability

Temporal action detection (TAD) aims to detect all action boundaries and...
research
08/07/2021

Temporal Action Localization Using Gated Recurrent Units

Temporal Action Localization (TAL) task in which the aim is to predict t...
research
03/27/2023

DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion

We propose a new formulation of temporal action detection (TAD) with den...
research
03/27/2017

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video

Manual annotations of temporal bounds for object interactions (i.e. star...
research
06/28/2019

Localizing Unseen Activities in Video via Image Query

Action localization in untrimmed videos is an important topic in the fie...

Please sign up or login with your details

Forgot password? Click here to reset