Learning the meanings of function words from grounded language using a visual question answering model

by   Eva Portelance, et al.

Interpreting a seemingly-simple function word like "or", "behind", or "more" can require logical, numerical, and relational reasoning. How are such words learned by children? Prior acquisition theories have often relied on positing a foundation of innate knowledge. Yet recent neural-network based visual question answering models apparently can learn to use function words as part of answering questions about complex visual scenes. In this paper, we study what these models learn about function words, in the hope of better understanding how the meanings of these words can be learnt by both models and children. We show that recurrent models trained on visually grounded language learn gradient semantics for function words requiring spacial and numerical reasoning. Furthermore, we find that these models can learn the meanings of logical connectives "and" and "or" without any prior knowledge of logical reasoning, as well as early evidence that they can develop the ability to reason about alternative expressions when interpreting language. Finally, we show that word learning difficulty is dependent on frequency in models' input. Our findings offer evidence that it is possible to learn the meanings of function words in visually grounded context by using non-symbolic general statistical learning algorithms, without any prior knowledge of linguistic meaning.


page 3

page 8

page 10

page 27


iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

A paraphrase is a restatement of the meaning of a text in other words. P...

Modelling word learning and recognition using visually grounded speech

Background: Computational models of speech recognition often assume that...

Understanding Grounded Language Learning Agents

Neural network-based systems can now learn to locate the referents of wo...

Curriculum Q-Learning for Visual Vocabulary Acquisition

The structure of curriculum plays a vital role in our learning process, ...

Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts

This work aims at modeling how the meaning of gradable adjectives of siz...

Pay Attention to Those Sets! Learning Quantification from Images

Major advances have recently been made in merging language and vision re...

Bayesian artificial brain with ChatGPT

This paper aims to investigate the mathematical problem-solving capabili...

Please sign up or login with your details

Forgot password? Click here to reset