Self-supervised Learning for Semi-supervised Temporal Language Grounding

09/23/2021
by   Fan Luo, et al.
0

Given a text description, Temporal Language Grounding (TLG) aims to localize temporal boundaries of the segments that contain the specified semantics in an untrimmed video. TLG is inherently a challenging task, as it requires to have comprehensive understanding of both video contents and text sentences. Previous works either tackle this task in a fully-supervised setting that requires a large amount of manual annotations or in a weakly supervised setting that cannot achieve satisfactory performance. To achieve good performance with limited annotations, we tackle this task in a semi-supervised way and propose a unified Semi-supervised Temporal Language Grounding (STLG) framework. STLG consists of two parts: (1) A pseudo label generation module that produces adaptive instant pseudo labels for unlabeled data based on predictions from a teacher model; (2) A self-supervised feature learning module with two sequential perturbations, i.e., time lagging and time scaling, for improving the video representation by inter-modal and intra-modal contrastive learning. We conduct experiments on the ActivityNet-CD-OOD and Charades-CD-OOD datasets and the results demonstrate that our proposed STLG framework achieve competitive performance compared to fully-supervised state-of-the-art methods with only a small portion of temporal annotations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2021

Self-Supervised Learning for Semi-Supervised Temporal Action Proposal

Self-supervised learning presents a remarkable performance to utilize un...
research
07/18/2023

You've Got Two Teachers: Co-evolutionary Image and Report Distillation for Semi-supervised Anatomical Abnormality Detection in Chest X-ray

Chest X-ray (CXR) anatomical abnormality detection aims at localizing an...
research
06/30/2021

Weakly Supervised Temporal Adjacent Network for Language Grounding

Temporal language grounding (TLG) is a fundamental and challenging probl...
research
08/08/2023

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

Temporal sentence grounding (TSG) aims to locate a specific moment from ...
research
01/14/2022

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

Temporal video grounding (TVG) aims to localize a target segment in a vi...
research
04/01/2022

Weakly Supervised Regional and Temporal Learning for Facial Action Unit Recognition

Automatic facial action unit (AU) recognition is a challenging task due ...
research
03/30/2022

Knowledge-Spreader: Learning Facial Action Unit Dynamics with Extremely Limited Labels

Recent studies on the automatic detection of facial action unit (AU) hav...

Please sign up or login with your details

Forgot password? Click here to reset