Semantic keyword spotting by learning from images and speech

10/05/2017
by   Herman Kamper, et al.
0

We consider the problem of representing semantic concepts in speech by learning from untranscribed speech paired with images of scenes. This setting is relevant in low-resource speech processing, robotics, and human language acquisition research. We use an external image tagger to generate soft labels, which serve as targets for training a neural model that maps speech to keyword labels. We introduce a newly collected data set of human semantic relevance judgements and an associated task, semantic keyword spotting, where the goal is to search for spoken utterances that are semantically relevant to a given text query. Without seeing any text, the model trained on parallel speech and images achieves a precision of almost 60 to a model trained on transcriptions, our model matches human judgements better by some measures, especially in retrieving non-verbatim semantic matches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2018

Visually grounded cross-lingual keyword spotting in speech

Recent work considered how images paired with speech can be used as supe...
research
03/23/2017

Visually grounded learning of keyword prediction from untranscribed speech

During language acquisition, infants have the benefit of visual cues to ...
research
04/15/2019

Semantic query-by-example speech search using visual grounding

A number of recent studies have started to investigate how speech system...
research
07/11/2018

Efficient keyword spotting using time delay neural networks

This paper describes a novel method of live keyword spotting using a two...
research
04/24/2019

On the Contributions of Visual and Textual Supervision in Low-resource Semantic Speech Retrieval

Recent work has shown that speech paired with images can be used to lear...
research
10/27/2018

Reagent: Converting Ordinary Webpages into Interactive Software Agents

We introduce Reagent, a technology that readily converts ordinary webpag...
research
01/31/2020

Training Keyword Spotters with Limited and Synthesized Speech Data

With the rise of low power speech-enabled devices, there is a growing de...

Please sign up or login with your details

Forgot password? Click here to reset