Weakly-Supervised Video Moment Retrieval via Semantic Completion Network

11/19/2019
by   Zhijie Lin, et al.
0

Video moment retrieval is to search the moment that is most relevant to the given natural language query. Existing methods are mostly trained in a fully-supervised setting, which requires the full annotations of temporal boundary for each query. However, manually labeling the annotations is actually time-consuming and expensive. In this paper, we propose a novel weakly-supervised moment retrieval framework requiring only coarse video-level annotations for training. Specifically, we devise a proposal generation module that aggregates the context information to generate and score all candidate proposals in one single pass. We then devise an algorithm that considers both exploitation and exploration to select top-K proposals. Next, we build a semantic completion module to measure the semantic similarity between the selected proposals and query, compute reward and provide feedbacks to the proposal generation module for scoring refinement. Experiments on the ActivityCaptions and Charades-STA demonstrate the effectiveness of our proposed method.

READ FULL TEXT

page 1

page 7

research
11/04/2021

Multi-scale 2D Representation Learning for weakly-supervised moment retrieval

Video moment retrieval aims to search the moment most relevant to a give...
research
08/24/2020

VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval

Video Moment Retrieval (VMR) is a task to localize the temporal moment i...
research
08/19/2020

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

Video moment retrieval aims to localize the target moment in an video ac...
research
08/10/2023

Counterfactual Cross-modality Reasoning for Weakly Supervised Video Moment Localization

Video moment localization aims to retrieve the target segment of an untr...
research
07/18/2023

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

3D visual grounding involves finding a target object in a 3D scene that ...
research
09/22/2021

Natural Language Video Localization with Learnable Moment Proposals

Given an untrimmed video and a natural language query, Natural Language ...
research
01/20/2021

Online Active Proposal Set Generation for Weakly Supervised Object Detection

To reduce the manpower consumption on box-level annotations, many weakly...

Please sign up or login with your details

Forgot password? Click here to reset