VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

11/26/2021
by   Changyao Tian, et al.
0

Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the image modality. In this work, we present a visual-linguistic long-tailed recognition framework, termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition (LTR). Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2 on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code shall be released.

READ FULL TEXT

page 1

page 8

page 12

page 13

page 14

research
10/21/2019

Decoupling Representation and Classifier for Long-Tailed Recognition

The long-tail distribution of the visual world poses great challenges fo...
research
08/31/2022

Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images

Camera traps, unmanned observation devices, and deep learning-based imag...
research
09/09/2021

Self Supervision to Distillation for Long-Tailed Visual Recognition

Deep learning has achieved remarkable progress for visual recognition on...
research
11/29/2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

The visual world naturally exhibits a long-tailed distribution of open c...
research
07/05/2022

DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition

There is a growing interest in the challenging visual perception task of...
research
02/19/2023

Mutual Exclusive Modulator for Long-Tailed Recognition

The long-tailed recognition (LTR) is the task of learning high-performan...
research
11/24/2022

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

A dramatic increase in real-world video volume with extremely diverse an...

Please sign up or login with your details

Forgot password? Click here to reset