Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

11/20/2021
by   Samuele Cornell, et al.
0

In many speech-enabled human-machine interaction scenarios, user speech can overlap with the device playback audio. In these instances, the performance of tasks such as keyword-spotting (KWS) and device-directed speech detection (DDD) can degrade significantly. To address this problem, we propose an implicit acoustic echo cancellation (iAEC) framework where a neural network is trained to exploit the additional information from a reference microphone channel to learn to ignore the interfering signal and improve detection performance. We study this framework for the tasks of KWS and DDD on, respectively, an augmented version of Google Speech Commands v2 and a real-world Alexa device dataset. Notably, we show a 56% reduction in false-reject rate for the DDD task during device playback conditions. We also show comparable or superior performance over a strong end-to-end neural echo cancellation + KWS baseline for the KWS task with an order of magnitude less computational requirements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2018

Data Augmentation for Robust Keyword Spotting under Playback Interference

Accurate on-device keyword spotting (KWS) with low false accept and fals...
research
10/09/2021

Streaming on-device detection of device directed speech from voice and touch-based invocation

When interacting with smart devices such as mobile phones or wearables, ...
research
03/30/2022

Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

We address the problem of detecting speech directed to a device that doe...
research
09/29/2020

Improving Device Directedness Classification of Utterances with Semantic Lexical Features

User interactions with personal assistants like Alexa, Google Home and S...
research
03/03/2023

Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers

Keyword spotting (KWS) is a core human-machine-interaction front-end tas...
research
08/29/2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation

In voice-enabled applications, a predetermined hotword isusually used to...
research
05/21/2023

DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

Real-world complex acoustic environments especially the ones with a low ...

Please sign up or login with your details

Forgot password? Click here to reset