Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

10/08/2020
by   Yinghui Huang, et al.
0

Training an end-to-end (E2E) neural network speech-to-intent (S2I) system that directly extracts intents from speech requires large amounts of intent-labeled speech data, which is time consuming and expensive to collect. Initializing the S2I model with an ASR model trained on copious speech data can alleviate data sparsity. In this paper, we attempt to leverage NLU text resources. We implemented a CTC-based S2I system that matches the performance of a state-of-the-art, traditional cascaded SLU system. We performed controlled experiments with varying amounts of speech and text training data. When only a tenth of the original data is available, intent classification accuracy degrades by 7.6 (without speech) available, we investigated two techniques to improve the S2I system: (1) transfer learning, in which acoustic embeddings for intent classification are tied to fine-tuned BERT text embeddings; and (2) data augmentation, in which the text-to-intent data is converted into speech-to-intent data using a multi-speaker text-to-speech system. The proposed approaches recover 80 speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/26/2022

Skit-S2I: An Indian Accented Speech to Intent dataset

Conventional conversation assistants extract text transcripts from the s...
research
02/26/2022

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

The lack of speech data annotated with labels required for spoken langua...
research
10/26/2022

End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English

Automation of on-call customer support relies heavily on accurate and ef...
research
08/05/2020

Improving End-to-End Speech-to-Intent Classification with Reptile

End-to-end spoken language understanding (SLU) systems have many advanta...
research
04/11/2022

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) ...
research
09/28/2021

Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification

End-to-end speech-to-intent classification has shown its advantage in ha...
research
04/20/2020

Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System

Abstract End-to-end text-to-speech (TTS) systems has proved its great su...

Please sign up or login with your details

Forgot password? Click here to reset