Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed

02/01/2019
by   Atta Norouzian, et al.
0

Voice controlled virtual assistants (VAs) are now available in smartphones, cars, and standalone devices in homes. In most cases, the user needs to first "wake-up" the VA by saying a particular word/phrase every time he or she wants the VA to do something. Eliminating the need for saying the wake-up word for every interaction could improve the user experience. This would require the VA to have the capability to detect the speech that is being directed at it and respond accordingly. In other words, the challenge is to distinguish between system-directed and non-system-directed speech utterances. In this paper, we present a number of neural network architectures for tackling this classification problem based on using only acoustic features. These architectures are based on using convolutional, recurrent and feed-forward layers. In addition, we investigate the use of an attention mechanism applied to the output of the convolutional and the recurrent layers. It is shown that incorporating the proposed attention mechanism into the models always leads to significant improvement in classification accuracy. The best model achieved equal error rates of 16.25 and 15.62 percents on two distinct realistic datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2020

Temporal Convolutional Attention-based Network For Sequence Modeling

With the development of feed-forward models, the default model for seque...
research
08/07/2018

Device-directed Utterance Detection

In this work, we propose a classifier for distinguishing device-directed...
research
12/23/2017

Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

We investigate whether infant-directed speech (IDS) could facilitate wor...
research
06/24/2015

Attention-Based Models for Speech Recognition

Recurrent sequence generators conditioned on input data through an atten...
research
03/31/2016

Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection

Recurrent neural network architectures combining with attention mechanis...
research
09/29/2020

Improving Device Directedness Classification of Utterances with Semantic Lexical Features

User interactions with personal assistants like Alexa, Google Home and S...
research
07/17/2020

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

In this paper, we propose a streaming model to distinguish voice queries...

Please sign up or login with your details

Forgot password? Click here to reset