What Vision-Language Models `See' when they See Scenes

09/15/2021
by   Michele Cafagna, et al.
10

Images can be described in terms of the objects they contain, or in terms of the types of scene or place that they instantiate. In this paper we address to what extent pretrained Vision and Language models can learn to align descriptions of both types with images. We compare 3 state-of-the-art models, VisualBERT, LXMERT and CLIP. We find that (i) V L models are susceptible to stylistic biases acquired during pretraining; (ii) only CLIP performs consistently well on both object- and scene-level descriptions. A follow-up ablation study shows that CLIP uses object-level information in the visual modality to align with scene-level textual descriptions.

READ FULL TEXT

page 1

page 5

page 10

page 11

research
11/09/2022

Understanding Cross-modal Interactions in V L Models that Generate Scene Descriptions

Image captioning models tend to describe images in an object-centric way...
research
05/24/2023

L-CAD: Language-based Colorization with Any-level Descriptions

Language-based colorization produces plausible and visually pleasing col...
research
10/27/2021

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Natural language instructions for visual navigation often use scene desc...
research
05/28/2010

Using Soft Constraints To Learn Semantic Models Of Descriptions Of Shapes

The contribution of this paper is to provide a semantic model (using sof...
research
02/25/2022

Exploring Multi-Modal Representations for Ambiguity Detection Coreference Resolution in the SIMMC 2.0 Challenge

Anaphoric expressions, such as pronouns and referential descriptions, ar...
research
11/10/2017

Object Referring in Visual Scene with Spoken Language

Object referring has important applications, especially for human-machin...
research
09/21/2023

ContextRef: Evaluating Referenceless Metrics For Image Description Generation

Referenceless metrics (e.g., CLIPScore) use pretrained vision–language m...

Please sign up or login with your details

Forgot password? Click here to reset