Improving End-to-End SLU performance with Prosodic Attention and Distillation

05/14/2023
by   Shangeth Rajaa, et al.
0

Most End-to-End SLU methods depend on the pretrained ASR or language model features for intent prediction. However, other essential information in speech, such as prosody, is often ignored. Recent research has shown improved results in classifying dialogue acts by incorporating prosodic information. The margins of improvement in these methods are minimal as the neural models ignore prosodic features. In this work, we propose prosody-attention, which uses the prosodic features differently to generate attention maps across time frames of the utterance. Then we propose prosody-distillation to explicitly learn the prosodic information in the acoustic encoder rather than concatenating the implicit prosodic features. Both the proposed methods improve the baseline results, and the prosody-distillation method gives an intent classification accuracy improvement of 8% and 2% on SLURP and STOP datasets over the prosody baseline.

READ FULL TEXT

page 1

page 4

research
02/15/2021

Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification

Intent classification is a task in spoken language understanding. An int...
research
08/05/2021

Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification

End-to-end intent classification using speech has numerous advantages co...
research
02/16/2022

Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers

End-to-end speech recognition is a promising technology for enabling com...
research
07/13/2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

We study speech intent classification and slot filling (SICSF) by propos...
research
05/11/2022

A neural prosody encoder for end-ro-end dialogue act classification

Dialogue act classification (DAC) is a critical task for spoken language...
research
05/09/2023

Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition

Attention-based contextual biasing approaches have shown significant imp...
research
04/08/2022

A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

SLU combines ASR and NLU capabilities to accomplish speech-to-intent und...

Please sign up or login with your details

Forgot password? Click here to reset