Boundary-sensitive Pre-training for Temporal Localization in Videos

11/21/2020
by   Mengmeng Xu, et al.
1

Many video analysis tasks require temporal localization thus detection of content changes. However, most existing models developed for these tasks are pre-trained on general video action classification tasks. This is because large scale annotation of temporal boundaries in untrimmed videos is expensive. Therefore no suitable datasets exist for temporal boundary-sensitive pre-training. In this paper for the first time, we investigate model pre-training for temporal localization by introducing a novel boundary-sensitive pretext (BSP) task. Instead of relying on costly manual annotations of temporal boundaries, we propose to synthesize temporal boundaries in existing video action classification datasets. With the synthesized boundaries, BSP can be simply conducted via classifying the boundary types. This enables the learning of video representations that are much more transferable to downstream temporal localization tasks. Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification based pre-training counterpart, and achieves new state-of-the-art performance on several temporal localization tasks.

READ FULL TEXT

page 1

page 4

research
04/26/2022

Contrastive Language-Action Pre-training for Temporal Localization

Long-form video understanding requires designing approaches that are abl...
research
03/25/2022

Unsupervised Pre-training for Temporal Action Localization Tasks

Unsupervised video representation learning has made remarkable achieveme...
research
03/28/2021

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization

Temporal action localization (TAL) is a fundamental yet challenging task...
research
09/11/2023

Temporal Action Localization with Enhanced Instant Discriminability

Temporal action detection (TAD) aims to detect all action boundaries and...
research
04/06/2023

Boundary-Denoising for Video Activity Localization

Video activity localization aims at understanding the semantic content i...
research
03/27/2017

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video

Manual annotations of temporal bounds for object interactions (i.e. star...
research
06/05/2023

Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Localization of the narrowest position of the vessel and corresponding v...

Please sign up or login with your details

Forgot password? Click here to reset