DeepAI
Log In Sign Up

From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning

10/11/2016
by   Lieke Gelderloos, et al.
0

We present a model of visually-grounded language learning based on stacked gated recurrent neural networks which learns to predict visual features given an image description in the form of a sequence of phonemes. The learning task resembles that faced by human language learners who need to discover both structure and meaning from noisy and ambiguous data across modalities. We show that our model indeed learns to predict features of the visual context given phonetically transcribed image descriptions, and show that it represents linguistic information in a hierarchy of levels: lower layers in the stack are comparatively more sensitive to form, whereas higher layers are more sensitive to meaning.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/07/2017

Representations of language in a model of visually grounded speech signal

We present a visually grounded model of speech perception which projects...
06/12/2018

iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

A paraphrase is a restatement of the meaning of a text in other words. P...
11/15/2022

Pragmatics in Grounded Language Learning: Phenomena, Tasks, and Modeling Approaches

People rely heavily on context to enrich meaning beyond what is literall...
06/12/2017

Encoding of phonology in a recurrent neural model of grounded speech

We study the representation and encoding of phonemes in a recurrent neur...
03/27/2017

Where to put the Image in an Image Caption Generator

When a neural language model is used for caption generation, the image i...
02/15/2018

Teaching Machines to Code: Neural Markup Generation with Visual Attention

We present a deep recurrent neural network model with soft visual attent...
05/04/2020

What is Learned in Visually Grounded Neural Syntax Acquisition

Visual features are a promising signal for learning bootstrap textual mo...