Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning

07/03/2020
by   Pavel Denisov, et al.
0

Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps. These components are optimized independently to allow usage of available data, but the overall system suffers from error propagation. In this paper, we propose a novel training method that enables pretrained contextual embeddings to process acoustic features. In particular, we extend it with an encoder of pretrained speech recognition systems in order to construct end-to-end spoken language understanding systems. Our proposed method is based on the teacher-student framework across speech and text modalities that aligns the acoustic and the semantic latent spaces. Experimental results in three benchmarks show that our system reaches the performance comparable to the pipeline architecture without using any training data and outperforms it after fine-tuning with ten examples per class on two out of three benchmarks.

READ FULL TEXT
research
02/15/2021

Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification

Intent classification is a task in spoken language understanding. An int...
research
06/29/2021

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding

Decomposable tasks are complex and comprise of a hierarchy of sub-tasks....
research
04/11/2022

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) ...
research
02/01/2021

End2End Acoustic to Semantic Transduction

In this paper, we propose a novel end-to-end sequence-to-sequence spoken...
research
05/17/2020

Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

Speech is one of the most effective means of communication and is full o...
research
04/04/2022

Analysis of Joint Speech-Text Embeddings for Semantic Matching

Embeddings play an important role in many recent end-to-end solutions fo...
research
05/22/2023

Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training

End-to-end (E2E) spoken language understanding (SLU) is constrained by t...

Please sign up or login with your details

Forgot password? Click here to reset