DeepAI AI Chat
Log In Sign Up

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

12/06/2022
by   Yi Wang, et al.
4

The foundation models have recently shown excellent performance on a variety of downstream tasks in computer vision. However, most existing vision foundation models simply focus on image-level pretraining and adpation, which are limited for dynamic and complex video-level understanding tasks. To fill the gap, we present general video foundation models, InternVideo, by taking advantage of both generative and discriminative self-supervised video learning. Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications. Without bells and whistles, InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications. Especially, our methods can obtain 91.1 on the challenging Kinetics-400 and Something-Something V2 benchmarks, respectively. All of these results effectively show the generality of our InternVideo for video understanding. The code will be released at https://github.com/OpenGVLab/InternVideo .

READ FULL TEXT
09/15/2022

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

This paper presents OmniVL, a new foundation model to support both image...
11/17/2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

In this report, we present our champion solutions to five tracks at Ego4...
08/29/2022

Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment

Vision and Language Pretraining has become the prevalent approach for ta...
03/10/2023

HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining

Human-centric perceptions include a variety of vision tasks, which have ...
10/10/2022

HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model Pretraining

The self-supervised ultrasound (US) video model pretraining can use a sm...
01/20/2022

Self-supervised Video Representation Learning with Cascade Positive Retrieval

Self-supervised video representation learning has been shown to effectiv...
11/18/2021

PyTorchVideo: A Deep Learning Library for Video Understanding

We introduce PyTorchVideo, an open-source deep-learning library that pro...

Code Repositories

InternVideo

InternVideo: General Video Foundation Models via Generative and Discriminative Learning(https://arxiv.org/abs/2212.03191)


view repo