Training Keyword Spotters with Limited and Synthesized Speech Data

01/31/2020
by   James Lin, et al.
1

With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obtaining a sufficient amount of training data. In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term detection models of around 400k parameters. Instead of training such models directly on the audio or low level features such as MFCCs, we use a pre-trained speech embedding model trained to extract useful features for keyword spotting models. Using this speech embedding, we show that a model which detects 10 keywords when trained on only synthetic speech is equivalent to a model trained on over 500 real examples. We also show that a model without our speech embeddings would need to be trained on over 4000 real examples to reach the same accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

Teaching keyword spotters to spot new keywords with limited examples

Learning to recognize new keywords with just a few examples is essential...
research
03/29/2023

AraSpot: Arabic Spoken Command Spotting

Spoken keyword spotting (KWS) is the task of identifying a keyword in an...
research
06/25/2018

Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring

We use dynamic time warping (DTW) as supervision for training a convolut...
research
05/28/2023

Spot keywords from very noisy and mixed speech

Most existing keyword spotting research focuses on conditions with sligh...
research
10/05/2017

Semantic keyword spotting by learning from images and speech

We consider the problem of representing semantic concepts in speech by l...
research
01/12/2019

Prototypical Metric Transfer Learning for Continuous Speech Keyword Spotting With Limited Training Data

Continuous Speech Keyword Spotting (CSKS) is the problem of spotting key...
research
01/27/2021

Low-Power Audio Keyword Spotting using Tsetlin Machines

The emergence of Artificial Intelligence (AI) driven Keyword Spotting (K...

Please sign up or login with your details

Forgot password? Click here to reset