Learning Word-Like Units from Joint Audio-Visual Analysis

01/25/2017
by   David Harwath, et al.
0

Given a collection of images and spoken audio captions, we present a method for discovering word-like acoustic units in the continuous speech signal and grounding them to semantically relevant image regions. For example, our model is able to detect spoken instances of the word 'lighthouse' within an utterance and associate them with image regions containing lighthouses. We do not use any form of conventional automatic speech recognition, nor do we use any text transcriptions or conventional linguistic annotations. Our model effectively implements a form of spoken language acquisition, in which the computer learns not only to recognize word categories by sound, but also to enrich the words it learns with semantics by grounding them in images.

READ FULL TEXT

page 4

page 7

page 10

research
12/31/2020

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

In this paper we present the first model for directly synthesizing fluen...
research
04/04/2018

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

In this paper, we explore neural network models that learn to associate ...
research
02/25/2022

Learning English with Peppa Pig

Attempts to computationally simulate the acquisition of spoken language ...
research
12/15/2019

Computational Induction of Prosodic Structure

The present study has two goals relating to the grammar of prosody, unde...
research
08/21/2000

Processing Self Corrections in a speech to speech system

Speech repairs occur often in spontaneous spoken dialogues. The ability ...
research
07/14/2023

SGGNet^2: Speech-Scene Graph Grounding Network for Speech-guided Navigation

The spoken language serves as an accessible and efficient interface, ena...
research
11/11/2015

Deep Multimodal Semantic Embeddings for Speech and Images

In this paper, we present a model which takes as input a corpus of image...

Please sign up or login with your details

Forgot password? Click here to reset