World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

06/14/2023
by   Ziqiao Ma, et al.
0

The ability to connect language units to their referents in the physical world, referred to as grounding, is crucial to learning and understanding grounded meanings of words. While humans demonstrate fast mapping in new word learning, it remains unclear whether modern vision-language models can truly represent language with their grounded meanings and how grounding may further bootstrap new word learning. To this end, we introduce Grounded Open Vocabulary Acquisition (GOVA) to examine grounding and bootstrapping in open-world language learning. As an initial attempt, we propose object-oriented BERT (OctoBERT), a novel visually-grounded language model by pre-training on image-text pairs highlighting grounding as an objective. Through extensive experiments and analysis, we demonstrate that OctoBERT is a more coherent and fast grounded word learner, and that the grounding ability acquired during pre-training helps the model to learn unseen words more rapidly and robustly. Our code is available at https://github.com/sled-group/world-to-words

READ FULL TEXT

page 2

page 3

page 4

page 8

page 16

research
06/17/2022

Language with Vision: a Study on Grounded Word and Sentence Embeddings

Language grounding to vision is an active field of research aiming to en...
research
05/22/2023

"According to ..." Prompting Language Models Improves Quoting from Pre-Training Data

Large Language Models (LLMs) may hallucinate and generate fake informati...
research
10/11/2022

Like a bilingual baby: The advantage of visually grounding a bilingual language model

Unlike most neural language models, humans learn language in a rich, mul...
research
08/17/2021

A Game Interface to Study Semantic Grounding in Text-Based Models

Can language models learn grounded representations from text distributio...
research
03/01/2023

Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control

Recent progress in large language models (LLMs) has demonstrated the abi...
research
07/05/2023

Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition

Human language acquisition is an efficient, supervised, and continual pr...
research
11/25/2019

Learning to Learn Words from Narrated Video

When we travel, we often encounter new scenarios we have never experienc...

Please sign up or login with your details

Forgot password? Click here to reset