Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding

04/21/2023
by   Mohan Li, et al.
0

This paper presents the use of non-autoregressive (NAR) approaches for joint automatic speech recognition (ASR) and spoken language understanding (SLU) tasks. The proposed NAR systems employ a Conformer encoder that applies connectionist temporal classification (CTC) to transcribe the speech utterance into raw ASR hypotheses, which are further refined with a bidirectional encoder representations from Transformers (BERT)-like decoder. In the meantime, the intent and slot labels of the utterance are predicted simultaneously using the same decoder. Both Mask-CTC and self-conditioned CTC (SC-CTC) approaches are explored for this study. Experiments conducted on the SLURP dataset show that the proposed SC-Mask-CTC NAR system achieves 3.7 SLU metrics and a competitive level of ASR accuracy, when compared to a Conformer-Transformer based autoregressive (AR) model. Additionally, the NAR systems achieve 6x faster decoding speed than the AR baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2020

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

Non-autoregressive (NAR) transformer models have achieved significantly ...
research
04/10/2021

Non-autoregressive Transformer-based End-to-end ASR using BERT

Transformer-based models have led to a significant innovation in various...
research
10/16/2022

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

Masked language model (MLM) has been widely used for understanding tasks...
research
10/11/2021

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Non-autoregressive (NAR) models simultaneously generate multiple outputs...
research
09/15/2023

Unimodal Aggregation for CTC-based Speech Recognition

This paper works on non-autoregressive automatic speech recognition. A u...
research
01/25/2022

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

While Transformers have achieved promising results in end-to-end (E2E) a...
research
04/01/2022

Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

End-to-end Spoken Language Understanding (E2E SLU) has attracted increas...

Please sign up or login with your details

Forgot password? Click here to reset