Unsupervised Temporal Video Grounding with Deep Semantic Clustering

01/14/2022
by   Daizong Liu, et al.
14

Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query. Though respectable works have made decent achievements in this task, they severely rely on abundant video-query paired data, which is expensive and time-consuming to collect in real-world scenarios. In this paper, we explore whether a video grounding model can be learned without any paired annotations. To the best of our knowledge, this paper is the first work trying to address TVG in an unsupervised setting. Considering there is no paired supervision, we propose a novel Deep Semantic Clustering Network (DSCNet) to leverage all semantic information from the whole query set to compose the possible activity in each video for grounding. Specifically, we first develop a language semantic mining module, which extracts implicit semantic features from the whole query set. Then, these language semantic features serve as the guidance to compose the activity in video via a video-based semantic aggregation module. Finally, we utilize a foreground attention branch to filter out the redundant background activities and refine the grounding results. To validate the effectiveness of our DSCNet, we conduct experiments on both ActivityNet Captions and Charades-STA datasets. The results demonstrate that DSCNet achieves competitive performance, and even outperforms most weakly-supervised approaches.

READ FULL TEXT

page 1

page 2

page 3

page 7

research
05/06/2023

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

This paper addresses the temporal sentence grounding (TSG). Although exi...
research
09/23/2020

A Simple Yet Effective Method for Video Temporal Grounding with Cross-Modality Attention

The task of language-guided video temporal grounding is to localize the ...
research
10/22/2022

Weakly-Supervised Temporal Article Grounding

Given a long untrimmed video and natural language queries, video groundi...
research
03/16/2020

Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

The task of temporally grounding textual queries in videos is to localiz...
research
08/14/2023

Temporal Sentence Grounding in Streaming Videos

This paper aims to tackle a novel task - Temporal Sentence Grounding in ...
research
09/23/2021

Self-supervised Learning for Semi-supervised Temporal Language Grounding

Given a text description, Temporal Language Grounding (TLG) aims to loca...
research
02/20/2023

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

Temporal sentence grounding aims to detect the event timestamps describe...

Please sign up or login with your details

Forgot password? Click here to reset