Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques

04/27/2021
by   Grzegorz Chrupała, et al.
0

This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years. Such models are inspired by the observation that when children pick up a language, they rely on a wide range of indirect and noisy clues, crucially including signals from the visual modality co-occurring with spoken utterances. Several fields have made important contributions to this approach to modeling or mimicking the process of learning language: Machine Learning, Natural Language and Speech Processing, Computer Vision and Cognitive Science. The current paper brings together these contributions in order to provide a useful introduction and overview for practitioners in all these areas. We discuss the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work. We then summarize the main modeling architectures and offer an exhaustive overview of the evaluation metrics and analysis techniques.

READ FULL TEXT

page 17

page 18

research
07/27/2018

A Survey of the Usages of Deep Learning in Natural Language Processing

Over the last several years, the field of natural language processing ha...
research
03/30/2023

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

The objective of this work is to explore the learning of visually ground...
research
07/19/2020

An Overview of Natural Language State Representation for Reinforcement Learning

A suitable state representation is a fundamental part of the learning pr...
research
02/07/2017

Representations of language in a model of visually grounded speech signal

We present a visually grounded model of speech perception which projects...
research
05/30/2023

Wave to Syntax: Probing spoken language models for syntax

Understanding which information is encoded in deep models of spoken and ...
research
11/20/2021

Deep Spoken Keyword Spotting: An Overview

Spoken keyword spotting (KWS) deals with the identification of keywords ...
research
05/12/2021

Discrete representations in neural models of spoken language

The distributed and continuous representations used by neural networks a...

Please sign up or login with your details

Forgot password? Click here to reset