Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

04/13/2021
by   Di Wu, et al.
0

Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones. For example, Intent detection (ID), and slot filling (SF) require its upstream automatic speech recognition (ASR) to transform the voice into text. In this case, the upstream perturbations, e.g. ASR errors, environmental noise and careless user speaking, will propagate to the ID and SF models, thus deteriorating the system performance. Therefore, the well-performing SF and ID models are expected to be noise resistant to some extent. However, existing models are trained on clean data, which causes a gap between clean data training and real-world inference. To bridge the gap, we propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Meanwhile, we design a denoising generation model to reduce the impact of the low-quality samples. Experiments on the widely-used dataset, i.e. Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment. The source code will be released.

READ FULL TEXT
research
04/07/2019

Spoken Language Intent Detection using Confusion2Vec

Decoding speaker's intent is a crucial part of spoken language understan...
research
07/22/2023

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

End-to-end (E2E) spoken language understanding (SLU) systems that genera...
research
06/26/2022

Meta Auxiliary Learning for Low-resource Spoken Language Understanding

Spoken language understanding (SLU) treats automatic speech recognition ...
research
05/25/2018

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

This paper presents the machine learning architecture of the Snips Voice...
research
07/13/2017

Predicting Causes of Reformulation in Intelligent Assistants

Intelligent assistants (IAs) such as Siri and Cortana conversationally i...
research
05/23/2022

Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection

The past ten years have witnessed the rapid development of text-based in...
research
10/22/2019

Robust Neural Machine Translation for Clean and Noisy Speech Transcripts

Neural machine translation models have shown to achieve high quality whe...

Please sign up or login with your details

Forgot password? Click here to reset