Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

04/21/2021
by   Qian Chen, et al.
0

In the traditional cascading architecture for spoken language understanding (SLU), it has been observed that automatic speech recognition errors could be detrimental to the performance of natural language understanding. End-to-end (E2E) SLU models have been proposed to directly map speech input to desired semantic frame with a single model, hence mitigating ASR error propagation. Recently, pre-training technologies have been explored for these E2E models. In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors. We explore phoneme labels as high-level speech features, and design and compare pre-training tasks based on conditional masked language model objectives and inter-sentence relation objectives. We also investigate the efficacy of combining textual and phonetic information during fine-tuning. Experimental results on spoken language understanding benchmarks, Fluent Speech Commands and SNIPS, show that the proposed approach significantly outperforms strong baseline models and improves robustness of spoken language understanding to ASR errors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2021

Speech-language Pre-training for End-to-end Spoken Language Understanding

End-to-end (E2E) spoken language understanding (SLU) can infer semantics...
research
02/13/2020

Pre-Training for Query Rewriting in A Spoken Language Understanding System

Query rewriting (QR) is an increasingly important technique to reduce cu...
research
10/05/2020

Semi-Supervised Speech-Language Joint Pre-Training for Spoken Language Understanding

Spoken language understanding (SLU) requires a model to analyze input ac...
research
07/17/2022

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

Spoken Language Understanding (SLU) is a core task in most human-machine...
research
02/27/2023

Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model

Automatic Speech Recognition (ASR) is a technology that converts spoken ...
research
05/26/2017

ASR error management for improving spoken language understanding

This paper addresses the problem of automatic speech recognition (ASR) e...
research
10/25/2020

Two-stage Textual Knowledge Distillation to Speech Encoder for Spoken Language Understanding

End-to-end approaches open a new way for more accurate and efficient spo...

Please sign up or login with your details

Forgot password? Click here to reset