Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

03/30/2022
by   Vineet Garg, et al.
0

We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to accidental button presses is critical for user experience. While the majority of approaches to false trigger mitigation (FTM) are designed to detect the presence of a target keyword, inferring user intent in absence of keyword is difficult. This also poses a challenge when creating the training/evaluation data for such systems due to inherent ambiguity in the user's data. To this end, we propose a novel FTM approach that uses weakly-labeled training data obtained with a newly introduced data sampling strategy. While this sampling strategy reduces data annotation efforts, the data labels are noisy as the data are not annotated manually. We use these data to train an acoustics-only model for the FTM task by regularizing its loss function via knowledge distillation from an ASR-based (LatticeRNN) model. This improves the model decisions, resulting in 66 the base acoustics-only model. We also show that the ensemble of the LatticeRNN and acoustic-distilled models brings further accuracy improvement of 20

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2021

Streaming on-device detection of device directed speech from voice and touch-based invocation

When interacting with smart devices such as mobile phones or wearables, ...
research
11/20/2021

Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

In many speech-enabled human-machine interaction scenarios, user speech ...
research
10/20/2020

Knowledge Transfer for Efficient On-device False Trigger Mitigation

In this paper, we address the task of determining whether a given uttera...
research
10/27/2021

Temporal Knowledge Distillation for On-device Audio Classification

Improving the performance of on-device audio classification models remai...
research
04/11/2022

Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

We trained a keyword spotting model using federated learning on real use...
research
08/01/2018

Data Augmentation for Robust Keyword Spotting under Playback Interference

Accurate on-device keyword spotting (KWS) with low false accept and fals...
research
06/15/2022

Latency Control for Keyword Spotting

Conversational agents commonly utilize keyword spotting (KWS) to initiat...

Please sign up or login with your details

Forgot password? Click here to reset