Progressive Voice Trigger Detection: Accuracy vs Latency

10/29/2020
by   Siddharth Sigtia, et al.
2

We present an architecture for voice trigger detection for virtual assistants. The main idea in this work is to exploit information in words that immediately follow the trigger phrase. We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision. However, waiting to listen to more audio each time incurs a latency increase. Progressive Voice Trigger Detection allows us to trade-off latency and accuracy by accepting clear trigger candidates quickly, but waiting for more context to decide whether to accept more marginal examples. Using a two-stage architecture, we show that by delaying the decision for just 3 detected true triggers in the test set, we are able to obtain a relative improvement of 66 increase in latency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2021

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

We present a unified and hardware efficient architecture for two stage v...
research
01/26/2020

Multi-task Learning for Voice Trigger Detection

We describe the design of a voice trigger detection system for smart spe...
research
03/08/2022

VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer

This paper presents an audio-visual approach for voice separation which ...
research
08/08/2020

Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection

We propose a stacked 1D convolutional neural network (S1DCNN) for end-to...
research
05/14/2022

Integration of Text and Graph-based Features for Detecting Mental Health Disorders from Voice

With the availability of voice-enabled devices such as smart phones, men...
research
03/23/2023

Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV

In TV services, dialogue level personalization is key to meeting user pr...
research
06/15/2022

Latency Control for Keyword Spotting

Conversational agents commonly utilize keyword spotting (KWS) to initiat...

Please sign up or login with your details

Forgot password? Click here to reset