GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

06/27/2023
by   Zhijian Hou, et al.
0

In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multi-scale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at<https://github.com/houzhijian/GroundNLQ>.

READ FULL TEXT

page 2

page 4

research
01/22/2023

Champion Solution for the WSDM2023 Toloka VQA Challenge

In this report, we present our champion solution to the WSDM2023 Toloka ...
research
10/22/2022

HAM: Hierarchical Attention Model with High Performance for 3D Visual Grounding

This paper tackles an emerging and challenging vision-language task, 3D ...
research
03/15/2023

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

Video temporal grounding aims to pinpoint a video segment that matches t...
research
03/26/2023

Affordance Grounding from Demonstration Video to Target Image

Humans excel at learning from expert demonstrations and solving their ow...
research
08/24/2021

Support-Set Based Cross-Supervision for Video Grounding

Current approaches for video grounding propose kinds of complex architec...
research
06/28/2021

Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection

With rapidly evolving internet technologies and emerging tools, sports r...
research
11/16/2022

An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022

This technical report describes the CONE approach for Ego4D Natural Lang...

Please sign up or login with your details

Forgot password? Click here to reset