Text-To-Concept (and Back) via Cross-Model Alignment

05/10/2023
by   Mazda Moayeri, et al.
0

We observe that the mapping between an image's representation in one model to its representation in another can be learned surprisingly well with just a linear layer, even across diverse models. Building on this observation, we propose text-to-concept, where features from a fixed pretrained model are aligned linearly to the CLIP space, so that text embeddings from CLIP's text encoder become directly comparable to the aligned features. With text-to-concept, we convert fixed off-the-shelf vision encoders to surprisingly strong zero-shot classifiers for free, with accuracy at times even surpassing that of CLIP, despite being much smaller models and trained on a small fraction of the data compared to CLIP. We show other immediate use-cases of text-to-concept, like building concept bottleneck models with no concept supervision, diagnosing distribution shifts in terms of human concepts, and retrieving images satisfying a set of text-based constraints. Lastly, we demonstrate the feasibility of concept-to-text, where vectors in a model's feature space are decoded by first aligning to the CLIP before being fed to a GPT-based generative model. Our work suggests existing deep models, with presumably diverse architectures and training, represent input samples relatively similarly, and a two-way communication across model representation spaces and to humans (through language) is viable.

READ FULL TEXT

page 2

page 5

page 6

page 12

page 17

page 18

page 20

page 21

research
04/18/2021

Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation

Traditional computer vision models are trained to predict a fixed set of...
research
10/04/2022

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training

Aligning the visual and language spaces requires to train deep neural ne...
research
05/24/2023

A Neural Space-Time Representation for Text-to-Image Personalization

A key aspect of text-to-image personalization methods is the manner in w...
research
09/30/2022

Linearly Mapping from Image to Text Space

The extent to which text-only language models (LMs) learn to represent t...
research
06/07/2020

Medical Concept Normalization in User Generated Texts by Learning Target Concept Embeddings

Medical concept normalization helps in discovering standard concepts in ...
research
12/22/2022

When are Lemons Purple? The Concept Association Bias of CLIP

Large-scale vision-language models such as CLIP have shown impressive pe...
research
02/07/2022

PatClArC: Using Pattern Concept Activation Vectors for Noise-Robust Model Debugging

State-of-the-art machine learning models are commonly (pre-)trained on l...

Please sign up or login with your details

Forgot password? Click here to reset