Streaming Intended Query Detection using E2E Modeling for Continued Conversation

08/29/2022
by   Shuo-yiin Chang, et al.
0

In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query.However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations. Toavoid repeating a hotword, we propose a streaming end-to-end(E2E) intended query detector that identifies the utterancesdirected towards the device and filters out other utterancesnot directed towards device. The proposed approach incor-porates the intended query detector into the E2E model thatalready folds different components of the speech recognitionpipeline into one neural network.The E2E modeling onspeech decoding and intended query detection also allows us todeclare a quick intended query detection based on early partialrecognition result, which is important to decrease latencyand make the system responsive. We demonstrate that theproposed E2E approach yields a 22 and 600 mslatency improvement compared with an independent intendedquery detector. In our experiment, the proposed model detectswhether the user is talking to the device with a 8.7 user starts speaking.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2020

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

In this paper, we propose a streaming model to distinguish voice queries...
research
10/09/2021

Streaming on-device detection of device directed speech from voice and touch-based invocation

When interacting with smart devices such as mobile phones or wearables, ...
research
08/08/2020

Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection

We propose a stacked 1D convolutional neural network (S1DCNN) for end-to...
research
05/02/2018

How Flajolet Processed Streams with Coin Flips

This article is a historical introduction to data streaming algorithms t...
research
11/20/2021

Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

In many speech-enabled human-machine interaction scenarios, user speech ...
research
10/25/2022

Dynamic Speech Endpoint Detection with Regression Targets

Interactive voice assistants have been widely used as input interfaces i...
research
10/11/2018

A Blended Deep Learning Approach for Predicting User Intended Actions

User intended actions are widely seen in many areas. Forecasting these a...

Please sign up or login with your details

Forgot password? Click here to reset