VideoLT: Large-scale Long-tailed Video Recognition

05/06/2021
by   Xing Zhang, et al.
10

Label distributions in real-world are oftentimes long-tailed and imbalanced, resulting in biased models towards dominant labels. While long-tailed recognition has been extensively studied for image classification tasks, limited effort has been made for video domain. In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition. Our VideoLT contains 256,218 untrimmed videos, annotated into 1,004 classes with a long-tailed distribution. Through extensive studies, we demonstrate that state-of-the-art methods used for long-tailed image recognition do not perform well in the video domain due to the additional temporal dimension in video data. This motivates us to propose FrameStack, a simple yet effective method for long-tailed video recognition task. In particular, FrameStack performs sampling at the frame-level in order to balance class distributions, and the sampling ratio is dynamically determined using knowledge derived from the network during training. Experimental results demonstrate that FrameStack can improve classification performance without sacrificing overall accuracy.

READ FULL TEXT

page 3

page 8

research
11/24/2022

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

A dramatic increase in real-world video volume with extremely diverse an...
research
02/15/2022

Balancing Domain Experts for Long-Tailed Camera-Trap Recognition

Label distributions in camera-trap images are highly imbalanced and long...
research
08/06/2023

Novel Class Discovery for Long-tailed Recognition

While the novel class discovery has achieved great success, existing met...
research
08/22/2021

Learning of Visual Relations: The Devil is in the Tails

Significant effort has been recently devoted to modeling visual relation...
research
03/11/2021

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models

Recent studies indicate that NLU models are prone to rely on shortcut fe...
research
04/12/2021

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

Training on datasets with long-tailed distributions has been challenging...
research
07/03/2023

Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data

Conformal prediction has emerged as a rigorous means of providing deep l...

Please sign up or login with your details

Forgot password? Click here to reset