Relation-aware Video Reading Comprehension for Temporal Language Grounding

10/12/2021
by   Jialin Gao, et al.
4

Temporal language grounding in videos aims to localize the temporal span relevant to the given query sentence. Previous methods treat it either as a boundary regression task or a span extraction task. This paper will formulate temporal language grounding into video reading comprehension and propose a Relation-aware Network (RaNet) to address it. This framework aims to select a video moment choice from the predefined answer set with the aid of coarse-and-fine choice-query interaction and choice-choice relation construction. A choice-query interactor is proposed to match the visual and textual information simultaneously in sentence-moment and token-moment levels, leading to a coarse-and-fine cross-modal interaction. Moreover, a novel multi-choice relation constructor is introduced by leveraging graph convolution to capture the dependencies among video moment choices for the best choice selection. Extensive experiments on ActivityNet-Captions, TACoS, and Charades-STA demonstrate the effectiveness of our solution. Codes will be released soon.

READ FULL TEXT
research
05/25/2022

You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos

Moment retrieval in videos is a challenging task that aims to retrieve t...
research
07/29/2022

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

Temporal grounding aims to locate a target video moment that semanticall...
research
03/14/2023

Generation-Guided Multi-Level Unified Network for Video Grounding

Video grounding aims to locate the timestamps best matching the query de...
research
08/11/2023

ViGT: Proposal-free Video Grounding with Learnable Token in Transformer

The video grounding (VG) task aims to locate the queried action or event...
research
08/14/2023

Knowing Where to Focus: Event-aware Transformer for Video Grounding

Recent DETR-based video grounding models have made the model directly pr...
research
01/08/2022

Learning Sample Importance for Cross-Scenario Video Temporal Grounding

The task of temporal grounding aims to locate video moment in an untrimm...
research
03/13/2022

Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video

The temporal answering grounding in the video (TAGV) is a new task natur...

Please sign up or login with your details

Forgot password? Click here to reset