Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?

09/23/2021
by   Tobias Norlund, et al.
0

Large language models are known to suffer from the hallucination problem in that they are prone to output statements that are false or inconsistent, indicating a lack of knowledge. A proposed solution to this is to provide the model with additional data modalities that complements the knowledge obtained through text. We investigate the use of visual data to complement the knowledge of large language models by proposing a method for evaluating visual knowledge transfer to text for uni- or multimodal language models. The method is based on two steps, 1) a novel task querying for knowledge of memory colors, i.e. typical colors of well-known objects, and 2) filtering of model training data to clearly separate knowledge contributions. Additionally, we introduce a model architecture that involves a visual imagination step and evaluate it with our proposed method. We find that our method can successfully be used to measure visual knowledge transfer capabilities in models and that our novel model architecture shows promising results for leveraging multimodal knowledge in a unimodal setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2022

What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge

There are limitations in learning language from text alone. Therefore, r...
research
05/20/2022

Visually-Augmented Language Modeling

Human language is grounded on multimodal knowledge including visual know...
research
05/17/2023

Evaluating Object Hallucination in Large Vision-Language Models

Inspired by the superior language abilities of large language models (LL...
research
02/03/2023

Controlling for Stereotypes in Multimodal Language Model Evaluation

We propose a methodology and design two benchmark sets for measuring to ...
research
08/16/2015

Online Representation Learning in Recurrent Neural Language Models

We investigate an extension of continuous online learning in recurrent n...
research
10/15/2021

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

Recent work has raised concerns about the inherent limitations of text-o...
research
10/24/2022

Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

Neural language models encode rich knowledge about entities and their re...

Please sign up or login with your details

Forgot password? Click here to reset