Probing Conceptual Understanding of Large Visual-Language Models

We present a novel framework for probing and improving relational, compositional and contextual understanding of large visual-language models (V+L). While large V+L models have achieved success in various downstream tasks, it is not clear if they have a conceptual grasp of the content. We propose a novel benchmarking dataset for probing three aspects of content understanding. Our probes are grounded in cognitive science and help determine if a V+L model can, for example, determine if snow garnished with a man is implausible, or if it can identify beach furniture by knowing it is located on a beach. We have experimented with 5 well known models, such as CLIP and ViLT, and found that they mostly fail to demonstrate a conceptual understanding. That said, we find interesting insights such as cross-attention helps learning conceptual understanding. We use these insights to propose a new finetuning technique that rewards the three conceptual understanding measures we proposed. We hope that the presented benchmarks will help the community assess and improve the conceptual understanding capabilities of large V+L models.

READ FULL TEXT

page 1

page 2

page 8

page 9

page 13

page 18

page 19

page 20

research
09/15/2021

Can Machines Read Coding Manuals Yet? – A Benchmark for Building Better Language Models for Code Understanding

Code understanding is an increasingly important application of Artificia...
research
05/30/2023

Does Conceptual Representation Require Embodiment? Insights From Large Language Models

Recent advances in large language models (LLM) have the potential to she...
research
10/18/2019

Reflecting After Learning for Understanding

Today, image classification is a common way for systems to process visua...
research
09/29/2022

Unpacking Large Language Models with Conceptual Consistency

If a Large Language Model (LLM) answers "yes" to the question "Are mount...
research
07/10/2023

Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations

Large Language Models (LLMs) have revolutionized natural language proces...
research
07/31/2017

Understanding tree: a tool for estimating an individual's understanding of conceptual knowledge

People learn whenever and wherever possible, and whatever they like or e...
research
04/12/2023

Semantic Feature Verification in FLAN-T5

This study evaluates the potential of a large language model for aiding ...

Please sign up or login with your details

Forgot password? Click here to reset