Learning Concise and Descriptive Attributes for Visual Recognition

08/07/2023
by   An Yan, et al.
0

Recent advances in foundation models present new opportunities for interpretable visual recognition – one can first query Large Language Models (LLMs) to obtain a set of attributes that describe each class, then apply vision-language models to classify images via these attributes. Pioneering work shows that querying thousands of attributes can achieve performance competitive with image features. However, our further investigation on 8 datasets reveals that LLM-generated attributes in a large quantity perform almost the same as random words. This surprising finding suggests that significant noise may be present in these attributes. We hypothesize that there exist subsets of attributes that can maintain the classification performance with much smaller sizes, and propose a novel learning-to-search method to discover those concise sets of attributes. As a result, on the CUB dataset, our method achieves performance close to that of massive LLM-generated attributes (e.g., 10k attributes for CUB), yet using only 32 attributes in total to distinguish 200 bird species. Furthermore, our new paradigm demonstrates several additional benefits: higher interpretability and interactivity for humans, and the ability to summarize knowledge for a recognition task.

READ FULL TEXT

page 7

page 8

page 15

research
05/05/2023

COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?

Compositional reasoning is a hallmark of human visual intelligence; yet ...
research
12/16/2014

Discovering beautiful attributes for aesthetic image analysis

Aesthetic image analysis is the study and assessment of the aesthetic pr...
research
06/12/2023

Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

The visual classification performance of vision-language models such as ...
research
03/23/2023

Fairness-guided Few-shot Prompting for Large Language Models

Large language models have demonstrated surprising ability to perform in...
research
09/16/2021

Efficient Attribute Injection for Pretrained Language Models

Metadata attributes (e.g., user and product IDs from reviews) can be inc...
research
03/22/2021

Intersection Regularization for Extracting Semantic Attributes

We consider the problem of supervised classification, such that the feat...
research
10/13/2022

Visual Classification via Description from Large Language Models

Vision-language models (VLMs) such as CLIP have shown promising performa...

Please sign up or login with your details

Forgot password? Click here to reset