Towards Data-efficient Modeling for Wake Word Spotting

10/13/2020
by   Yixin Gao, et al.
0

Wake word (WW) spotting is challenging in far-field not only because of the interference in signal transmission but also the complexity in acoustic environments. Traditional WW model training requires large amount of in-domain WW-specific data with substantial human annotations therefore it is hard to build WW models without such data. In this paper we present data-efficient solutions to address the challenges in WW modeling, such as domain-mismatch, noisy conditions, limited annotation, etc. Our proposed system is composed of a multi-condition training pipeline with a stratified data augmentation, which improves the model robustness to a variety of predefined acoustic conditions, together with a semi-supervised learning pipeline to accurately extract the WW and confusable examples from untranscribed speech corpus. Starting from only 10 hours of domain-mismatched WW audio, we are able to enlarge and enrich the training dataset by 20-100 times to capture the acoustic complexity. Our experiments on real user data show that the proposed solutions can achieve comparable performance of a production-grade model by saving 97% of the amount of WW-specific data collection and 86% of the bandwidth for annotation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2019

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

Automatic lyrics to polyphonic audio alignment is a challenging task not...
research
08/19/2019

Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews

In automatic speech recognition, often little training data is available...
research
10/25/2022

Semi-Supervised Learning Based on Reference Model for Low-resource TTS

Most previous neural text-to-speech (TTS) methods are mainly based on su...
research
08/30/2018

Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

Although end-to-end text-to-speech (TTS) models such as Tacotron have sh...
research
04/24/2019

Realizing Petabyte Scale Acoustic Modeling

Large scale machine learning (ML) systems such as the Alexa automatic sp...
research
06/20/2019

Semi-supervised acoustic model training for five-lingual code-switched ASR

This paper presents recent progress in the acoustic modelling of under-r...
research
09/15/2021

3D Annotation Of Arbitrary Objects In The Wild

Recent years have produced a variety of learning based methods in the co...

Please sign up or login with your details

Forgot password? Click here to reset