Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

12/01/2022
by   Zhuowan Li, et al.
0

Despite the superior performance brought by vision-and-language pretraining, it remains unclear whether learning with multi-modal data can help understand each individual modality. In this work, we investigate how language can help with visual representation learning from a probing perspective. Specifically, we compare vision-and-language and vision-only models by probing their visual representations on a broad range of tasks, in order to assess the quality of the learned representations in a fine-grained manner. Interestingly, our probing results suggest that vision-and-language models are better at label prediction tasks like object and attribute prediction, while vision-only models are stronger at dense prediction tasks that require more localized information. With further analysis using detailed metrics, our study suggests that language helps vision models learn better semantics, but not localization. Code is released at https://github.com/Lizw14/visual_probing.

READ FULL TEXT
research
12/08/2020

LAMP: Label Augmented Multimodal Pretraining

Multi-modal representation learning by pretraining has become an increas...
research
02/10/2023

Is multi-modal vision supervision beneficial to language?

Vision (image and video) - Language (VL) pre-training is the recent popu...
research
04/08/2022

Contextual Representation Learning beyond Masked Language Modeling

How do masked language models (MLMs) such as BERT learn contextual repre...
research
04/16/2021

Effect of Vision-and-Language Extensions on Natural Language Understanding in Vision-and-Language Models

Extending language models with structural modifications and vision-and-l...
research
07/31/2022

Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics

Language modality within the vision language pretraining framework is in...
research
05/09/2023

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

We present an interactive visual framework named InternGPT, or iGPT for ...
research
04/03/2023

Probabilistic Prompt Learning for Dense Prediction

Recent progress in deterministic prompt learning has become a promising ...

Please sign up or login with your details

Forgot password? Click here to reset