On Guiding Visual Attention with Language Specification

02/17/2022
by   Suzanne Petryk, et al.
3

While real world challenges typically define visual categories with language words or phrases, most visual classification methods define categories with numerical indices. However, the language specification of the classes provides an especially useful prior for biased and noisy datasets, where it can help disambiguate what features are task-relevant. Recently, large-scale multimodal models have been shown to recognize a wide variety of high-level concepts from a language specification even without additional image training data, but they are often unable to distinguish classes for more fine-grained tasks. CNNs, in contrast, can extract subtle image features that are required for fine-grained discrimination, but will overfit to any bias or noise in datasets. Our insight is to use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors. To do this, we ground task-relevant words or phrases with attention maps from a pretrained large-scale model. We then use this grounding to supervise a classifier's spatial attention away from distracting context. We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data, including about 3-15 worst-group accuracy improvements and 41-45 metrics.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 14

page 16

research
10/09/2022

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

Multimodal representation learning has shown promising improvements on v...
research
05/21/2022

Fine-Grained Visual Classification using Self Assessment Classifier

Extracting discriminative features plays a crucial role in the fine-grai...
research
07/18/2022

Few-shot Fine-grained Image Classification via Multi-Frequency Neighborhood and Double-cross Modulation

Traditional fine-grained image classification typically relies on large-...
research
11/14/2015

Learning Fine-grained Features via a CNN Tree for Large-scale Classification

We propose a novel approach to enhance the discriminability of Convoluti...
research
09/15/2023

Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions

The challenge in fine-grained visual categorization lies in how to explo...
research
08/16/2021

WikiChurches: A Fine-Grained Dataset of Architectural Styles with Real-World Challenges

We introduce a novel dataset for architectural style classification, con...
research
11/14/2018

Neural Based Statement Classification for Biased Language

Biased language commonly occurs around topics which are of controversial...

Please sign up or login with your details

Forgot password? Click here to reset