Improving Device Directedness Classification of Utterances with Semantic Lexical Features

09/29/2020
by   Kellen Gillespie, et al.
0

User interactions with personal assistants like Alexa, Google Home and Siri are typically initiated by a wake term or wakeword. Several personal assistants feature "follow-up" modes that allow users to make additional interactions without the need of a wakeword. For the system to only respond when appropriate, and to ignore speech not intended for it, utterances must be classified as device-directed or non-device-directed. State-of-the-art systems have largely used acoustic features for this task, while others have used only lexical features or have added LM-based lexical features. We propose a directedness classifier that combines semantic lexical features with a lightweight acoustic feature and show it is effective in classifying directedness. The mixed-domain lexical and acoustic feature model is able to achieve 14 baseline model. Finally, we successfully apply transfer learning and semi-supervised learning to the model to improve accuracy even further.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2018

Lexico-acoustic Neural-based Models for Dialog Act Classification

Recent works have proposed neural models for dialog act classification i...
research
11/20/2021

Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

In many speech-enabled human-machine interaction scenarios, user speech ...
research
04/05/2020

Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition

We present an analysis of semi-supervised acoustic and language model tr...
research
08/07/2018

Device-directed Utterance Detection

In this work, we propose a classifier for distinguishing device-directed...
research
04/13/2021

Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Lexical Information Fusion

Textual escalation detection has been widely applied to e-commerce compa...
research
02/01/2019

Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed

Voice controlled virtual assistants (VAs) are now available in smartphon...
research
07/04/2022

BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model

Several recent studies have tested the use of transformer language model...

Please sign up or login with your details

Forgot password? Click here to reset