Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification

02/15/2021
by   Bidisha Sharma, et al.
0

Intent classification is a task in spoken language understanding. An intent classification system is usually implemented as a pipeline process, with a speech recognition module followed by text processing that classifies the intents. There are also studies of end-to-end system that takes acoustic features as input and classifies the intents directly. Such systems don't take advantage of relevant linguistic information, and suffer from limited training data. In this work, we propose a novel intent classification framework that employs acoustic features extracted from a pretrained speech recognition system and linguistic features learned from a pretrained language model. We use knowledge distillation technique to map the acoustic embeddings towards linguistic embeddings. We perform fusion of both acoustic and linguistic embeddings through cross-attention approach to classify intents. With the proposed method, we achieve 90.86 speech corpus, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2020

Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning

Spoken language understanding is typically based on pipeline architectur...
research
08/05/2021

Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification

End-to-end intent classification using speech has numerous advantages co...
research
05/14/2023

Improving End-to-End SLU performance with Prosodic Attention and Distillation

Most End-to-End SLU methods depend on the pretrained ASR or language mod...
research
04/08/2022

Transducer-based language embedding for spoken language identification

The acoustic and linguistic features are important cues for the spoken l...
research
10/14/2021

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

Many mispronunciation detection and diagnosis (MD D) research approach...
research
02/20/2019

Audio-Linguistic Embeddings for Spoken Sentences

We propose spoken sentence embeddings which capture both acoustic and li...

Please sign up or login with your details

Forgot password? Click here to reset