Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

11/24/2022
by   WonJun Moon, et al.
0

A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In this work, we summarize the challenges in VLTR and explore how to overcome them. The challenges are: (1) it is impractical to re-train the whole model for high-quality features, (2) acquiring frame-wise labels requires extensive cost, and (3) long-tailed data triggers biased training. Yet, most existing works for VLTR unavoidably utilize image-level features extracted from pretrained models which are task-irrelevant, and learn by video-level labels. Therefore, to deal with such (1) task-irrelevant features and (2) video-level labels, we introduce two complementary learnable feature aggregators. Learnable layers in each aggregator are to produce task-relevant representations, and each aggregator is to assemble the snippet-wise knowledge into a video representative. Then, we propose Minority-Oriented Vicinity Expansion (MOVE) that explicitly leverages the class frequency into approximating the vicinity distributions to alleviate (3) biased training. By combining these solutions, our approach achieves state-of-the-art results on large-scale VideoLT and synthetically induced Imbalanced-MiniKinetics200. With VideoLT features from ResNet-50, it attains 18 classes over the previous state-of-the-art method, respectively.

READ FULL TEXT
research
05/06/2021

VideoLT: Large-scale Long-tailed Video Recognition

Label distributions in real-world are oftentimes long-tailed and imbalan...
research
09/11/2022

Inverse Image Frequency for Long-tailed Image Recognition

The long-tailed distribution is a common phenomenon in the real world. E...
research
04/03/2023

Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation

Deep neural networks have made huge progress in the last few decades. Ho...
research
07/29/2022

Class-Difficulty Based Methods for Long-Tailed Visual Recognition

Long-tailed datasets are very frequently encountered in real-world use c...
research
11/26/2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Deep learning-based models encounter challenges when processing long-tai...
research
07/20/2021

Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision

Existing long-tailed recognition methods, aiming to train class-balance ...
research
09/06/2023

Image Aesthetics Assessment via Learnable Queries

Image aesthetics assessment (IAA) aims to estimate the aesthetics of ima...

Please sign up or login with your details

Forgot password? Click here to reset