Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

07/13/2023
by   He Huang, et al.
0

We study speech intent classification and slot filling (SICSF) by proposing to use an encoder pretrained on speech recognition (ASR) to initialize an end-to-end (E2E) Conformer-Transformer model, which achieves the new state-of-the-art results on the SLURP dataset, with 90.14 82.27 self-supervised learning (SSL), and show that ASR pretraining is much more effective than SSL for SICSF. To explore parameter efficiency, we freeze the encoder and add Adapter modules, and show that parameter efficiency is only achievable with an ASR-pretrained encoder, while the SSL encoder needs full finetuning to achieve comparable results. In addition, we provide an in-depth comparison on end-to-end models versus cascading models (ASR+NLU), and show that E2E models are better than cascaded models unless an oracle ASR model is provided. Last but not least, our model is the first E2E model that achieves the same performance as cascading models with oracle ASR. Code, checkpoints and configs are available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2021

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Self-supervised pretraining on speech data has achieved a lot of progres...
research
12/26/2022

Skit-S2I: An Indian Accented Speech to Intent dataset

Conventional conversation assistants extract text transcripts from the s...
research
10/31/2021

FANS: Fusing ASR and NLU for on-device SLU

Spoken language understanding (SLU) systems translate voice input comman...
research
10/26/2020

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

Much recent work on Spoken Language Understanding (SLU) is limited in at...
research
05/14/2023

Improving End-to-End SLU performance with Prosodic Attention and Distillation

Most End-to-End SLU methods depend on the pretrained ASR or language mod...
research
07/20/2023

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

There has been an increased interest in the integration of pretrained sp...
research
09/15/2023

Augmenting conformers with structured state space models for online speech recognition

Online speech recognition, where the model only accesses context to the ...

Please sign up or login with your details

Forgot password? Click here to reset