Device Directedness with Contextual Cues for Spoken Dialog Systems

11/23/2022
by   Dhanush Bekal, et al.
0

In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infuse lexical information directly into speech representations to improve the domain-specific language information implicitly learned during pre-training. Experiments conducted on spoken dialog data show that our proposed model trained to validate barge-in entirely from speech representations is faster by 38 improvement over a baseline LSTM model that uses both audio and Automatic Speech Recognition (ASR) 1-best hypotheses. On top of this, our best proposed model with lexically infused representations along with contextual features provides a further relative improvement of 5.7 faster than the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2022

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Spoken language understanding (SLU) is a task aiming to extract high-lev...
research
05/19/2023

Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment

Recently, speech-text pre-training methods have shown remarkable success...
research
11/10/2022

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

Collecting sufficient labeled data for spoken language understanding (SL...
research
06/24/2022

Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models

In this work, we analyzed and compared speech representations extracted ...
research
01/28/2020

Joint Contextual Modeling for ASR Correction and Language Understanding

The quality of automatic speech recognition (ASR) is critical to Dialogu...
research
10/12/2021

Multi-Modal Pre-Training for Automated Speech Recognition

Traditionally, research in automated speech recognition has focused on l...
research
08/07/2022

When can I Speak? Predicting initiation points for spoken dialogue agents

Current spoken dialogue systems initiate their turns after a long period...

Please sign up or login with your details

Forgot password? Click here to reset