More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes

08/02/2023
by   Bang An, et al.
0

CLIP, as a foundational vision language model, is widely used in zero-shot image classification due to its ability to understand various visual concepts and natural language descriptions. However, how to fully leverage CLIP's unprecedented human-like understanding capabilities to achieve better zero-shot classification is still an open question. This paper draws inspiration from the human visual perception process: a modern neuroscience view suggests that in classifying an object, humans first infer its class-independent attributes (e.g., background and orientation) which help separate the foreground object from the background, and then make decisions based on this information. Inspired by this, we observe that providing CLIP with contextual attributes improves zero-shot classification and mitigates reliance on spurious features. We also observe that CLIP itself can reasonably infer the attributes from an image. With these observations, we propose a training-free, two-step zero-shot classification method named PerceptionCLIP. Given an image, it first infers contextual attributes (e.g., background) and then performs object classification conditioning on them. Our experiments show that PerceptionCLIP achieves better generalization, group robustness, and better interpretability. For example, PerceptionCLIP with ViT-L/14 improves the worst group accuracy by 16.5

READ FULL TEXT

page 8

page 24

page 25

research
09/11/2023

Zero-Shot Co-salient Object Detection Framework

Co-salient Object Detection (CoSOD) endeavors to replicate the human vis...
research
06/05/2023

Visually-Grounded Descriptions Improve Zero-Shot Image Classification

Language-vision models like CLIP have made significant progress in zero-...
research
03/26/2023

ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection

Background subtraction (BGS) aims to extract all moving objects in the v...
research
07/29/2017

Zero-Shot Activity Recognition with Verb Attribute Induction

In this paper, we investigate large-scale zero-shot activity recognition...
research
11/28/2016

Gaze Embeddings for Zero-Shot Image Classification

Zero-shot image classification using auxiliary information, such as attr...
research
07/19/2018

Selective Zero-Shot Classification with Augmented Attributes

In this paper, we introduce a selective zero-shot classification problem...
research
07/14/2022

Contrastive Adapters for Foundation Model Group Robustness

While large pretrained foundation models (FMs) have shown remarkable zer...

Please sign up or login with your details

Forgot password? Click here to reset