Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

02/12/2021
by   Milind Rao, et al.
0

Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems. SLU models, which either directly extract semantics from audio or are composed of pipelined automatic speech recognition (ASR) and natural language understanding (NLU) models, are typically trained via differentiable cross-entropy losses, even when the relevant performance metrics of interest are word or semantic error rates. In this work, we propose non-differentiable sequence losses based on SLU metrics as a proxy for semantic error and use the REINFORCE trick to train ASR and SLU models with this loss. We show that custom sequence loss training is the state-of-the-art on open SLU datasets and leads to 6 both ASR and NLU performance metrics on large proprietary datasets. We also demonstrate how the semantic sequence loss training paradigm can be used to update ASR and SLU models without transcripts, using semantic feedback alone.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2022

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis

Building Spoken Language Understanding (SLU) robust to Automatic Speech ...
research
06/12/2023

Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

Recent voice assistants are usually based on the cascade spoken language...
research
08/14/2020

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

We consider the problem of spoken language understanding (SLU) of extrac...
research
05/04/2023

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

It is challenging to extract semantic meanings directly from audio signa...
research
06/16/2021

End-to-End Spoken Language Understanding for Generalized Voice Assistants

End-to-end (E2E) spoken language understanding (SLU) systems predict utt...
research
04/01/2022

Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

End-to-end Spoken Language Understanding (E2E SLU) has attracted increas...
research
05/26/2017

ASR error management for improving spoken language understanding

This paper addresses the problem of automatic speech recognition (ASR) e...

Please sign up or login with your details

Forgot password? Click here to reset