Pre-training for low resource speech-to-intent applications

03/30/2021
by   Pu Wang, et al.
0

Designing a speech-to-intent (S2I) agent which maps the users' spoken commands to the agents' desired task actions can be challenging due to the diverse grammatical and lexical preference of different users. As a remedy, we discuss a user-taught S2I system in this paper. The user-taught system learns from scratch from the users' spoken input with action demonstration, which ensure it is fully matched to the users' way of formulating intents and their articulation habits. The main issue is the scarce training data due to the user effort involved. Existing state-of-art approaches in this setting are based on non-negative matrix factorization (NMF) and capsule networks. In this paper we combine the encoder of an end-to-end ASR system with the prior NMF/capsule network-based user-taught decoder, and investigate whether pre-training methodology can reduce training data requirements for the NMF and capsule network. Experimental results show the pre-trained ASR-NMF framework significantly outperforms other models, and also, we discuss limitations of pre-training with different types of command-and-control(C C) applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2018

Capsule Networks for Low Resource Spoken Language Understanding

Designing a spoken language understanding system for command-and-control...
research
04/07/2022

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

In spoken language understanding (SLU), what the user says is converted ...
research
08/05/2020

Improving End-to-End Speech-to-Intent Classification with Reptile

End-to-end spoken language understanding (SLU) systems have many advanta...
research
10/18/2021

Intent Classification Using Pre-Trained Embeddings For Low Resource Languages

Building Spoken Language Understanding (SLU) systems that do not rely on...
research
05/02/2022

Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

We introduce Wav2Seq, the first self-supervised approach to pre-train bo...
research
03/27/2023

Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

Human-robot interaction relies on a noise-robust audio processing module...
research
10/27/2018

Reagent: Converting Ordinary Webpages into Interactive Software Agents

We introduce Reagent, a technology that readily converts ordinary webpag...

Please sign up or login with your details

Forgot password? Click here to reset