A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

11/16/2022
by   Sicheng Mo, et al.
0

This report describes Badgers@UW-Madison, our submission to the Ego4D Natural Language Queries (NLQ) Challenge. Our solution inherits the point-based event representation from our prior work on temporal action localization, and develops a Transformer-based model for video grounding. Further, our solution integrates several strong video features including SlowFast, Omnivore and EgoVLP. Without bells and whistles, our submission based on a single model achieves 12.64 Meanwhile, our method garners 28.45 the top-ranked solution by up to 5.5 absolute percentage points.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset