Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

04/11/2023
by   Ahmet Iscen, et al.
0

Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems. The goal is to enhance the recognition capabilities of the model by retrieving similar examples for the visual input from an external memory set. In this work, we introduce an attention-based memory module, which learns the importance of each retrieved example from the memory. Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query. We also thoroughly study various ways of constructing the memory dataset. Our experiments show the benefit of using a massive-scale memory dataset of 1B image-text pairs, and demonstrate the performance of different memory representations. We evaluate our method in three different classification tasks, namely long-tailed recognition, learning with noisy labels, and fine-grained classification, and show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.

READ FULL TEXT

page 2

page 8

page 11

page 12

research
03/20/2020

Three-branch and Mutil-scale learning for Fine-grained Image Recognition (TBMSL-Net)

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the...
research
02/22/2022

Retrieval Augmented Classification for Long-Tail Visual Recognition

We introduce Retrieval Augmented Classification (RAC), a generic approac...
research
02/11/2021

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Pre-trained representations are becoming crucial for many NLP and percep...
research
06/18/2020

Semi-Supervised Recognition under a Noisy and Fine-grained Dataset

Simi-Supervised Recognition Challenge-FGVC7 is a challenging fine-graine...
research
06/20/2021

Solution for Large-scale Long-tailed Recognition with Noisy Labels

This is a technical report for CVPR 2021 AliProducts Challenge. AliProdu...
research
01/24/2022

Learning Semantics for Visual Place Recognition through Multi-Scale Attention

In this paper we address the task of visual place recognition (VPR), whe...
research
05/13/2022

ImageSig: A signature transform for ultra-lightweight image recognition

This paper introduces a new lightweight method for image recognition. Im...

Please sign up or login with your details

Forgot password? Click here to reset