DeepAI AI Chat
Log In Sign Up

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

by   Guo Chen, et al.

In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt the strong foundation model to the downstream ego-centric video understanding tasks with simple head designs. In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of InternVideo as a video foundation model. Our code will be released at


page 1

page 2

page 3

page 4


InternVideo: General Video Foundation Models via Generative and Discriminative Learning

The foundation models have recently shown excellent performance on a var...

Where a Strong Backbone Meets Strong Features – ActionFormer for Ego4D Moment Queries Challenge

This report describes our submission to the Ego4D Moment Queries Challen...

Visual Prompt Multi-Modal Tracking

Visible-modal object tracking gives rise to a series of downstream multi...

Exploring Anchor-based Detection for Ego4D Natural Language Query

In this paper we provide the technique report of Ego4D natural language ...

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

This report describes Badgers@UW-Madison, our submission to the Ego4D Na...

Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism

This report describes our submission called "TarHeels" for the Ego4D: Ob...

Improved Techniques for Learning to Dehaze and Beyond: A Collective Study

This paper reviews the collective endeavors by the team of authors in ex...

Code Repositories