Retrieval Augmented Classification for Long-Tail Visual Recognition

02/22/2022
by   Alexander Long, et al.
4

We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module. RAC consists of a standard base image encoder fused with a parallel retrieval branch that queries a non-parametric external memory of pre-encoded images and associated text snippets. We apply RAC to the problem of long-tail classification and demonstrate a significant improvement over previous state-of-the-art on Places365-LT and iNaturalist-2018 (14.5 respectively), despite using only the training datasets themselves as the external information source. We demonstrate that RAC's retrieval module, without prompting, learns a high level of accuracy on tail classes. This, in turn, frees the base encoder to focus on common classes, and improve its performance thereon. RAC represents an alternative approach to utilizing large, pretrained models without requiring fine-tuning, as well as a first step towards more effectively making use of external memory within common computer vision architectures.

READ FULL TEXT

page 13

page 15

page 16

research
04/11/2023

Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

Retrieval augmented models are becoming increasingly popular for compute...
research
10/06/2022

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

While language Models store a massive amount of world knowledge implicit...
research
02/09/2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

Augmenting pretrained language models (LMs) with a vision encoder (e.g.,...
research
09/29/2022

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

Research on text-to-image generation has witnessed significant progress ...
research
10/30/2022

An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

Access to external knowledge is essential for many natural language proc...
research
04/25/2022

Retrieval-Augmented Diffusion Models

Generative image synthesis with diffusion models has recently achieved e...
research
07/27/2023

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation

Training an image captioner without annotated image-sentence pairs has g...

Please sign up or login with your details

Forgot password? Click here to reset