Data Augmentation for Robust Keyword Spotting under Playback Interference

08/01/2018
by   Anirudh Raju, et al.
0

Accurate on-device keyword spotting (KWS) with low false accept and false reject rate is crucial to customer experience for far-field voice control of conversational agents. It is particularly challenging to maintain low false reject rate in real world conditions where there is (a) ambient noise from external sources such as TV, household appliances, or other speech that is not directed at the device (b) imperfect cancellation of the audio playback from the device, resulting in residual echo, after being processed by the Acoustic Echo Cancellation (AEC) system. In this paper, we propose a data augmentation strategy to improve keyword spotting performance under these challenging conditions. The training set audio is artificially corrupted by mixing in music and TV/movie audio, at different signal to interference ratios. Our results show that we get around 30-45 range of false alarm rates, under audio playback from such devices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2021

Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

In many speech-enabled human-machine interaction scenarios, user speech ...
research
04/06/2023

To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive Refinement

Keyword spotting systems continuously process audio streams to detect ke...
research
01/29/2021

Speech Enhancement for Wake-Up-Word detection in Voice Assistants

Keyword spotting and in particular Wake-Up-Word (WUW) detection is a ver...
research
05/21/2020

Training Keyword Spotting Models on Non-IID Data with Federated Learning

We demonstrate that a production-quality keyword-spotting model can be t...
research
03/30/2022

Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

We address the problem of detecting speech directed to a device that doe...
research
10/09/2021

Streaming on-device detection of device directed speech from voice and touch-based invocation

When interacting with smart devices such as mobile phones or wearables, ...
research
03/29/2023

AraSpot: Arabic Spoken Command Spotting

Spoken keyword spotting (KWS) is the task of identifying a keyword in an...

Please sign up or login with your details

Forgot password? Click here to reset