Enhancing Out-Of-Domain Utterance Detection with Data Augmentation Based on Word Embeddings

11/24/2019
by   Yueqi Feng, et al.
0

For most intelligent assistant systems, it is essential to have a mechanism that detects out-of-domain (OOD) utterances automatically to handle noisy input properly. One typical approach would be introducing a separate class that contains OOD utterance examples combined with in-domain text samples into the classifier. However, since OOD utterances are usually unseen to the training datasets, the detection performance largely depends on the quality of the attached OOD text data with restricted sizes of samples due to computing limits. In this paper, we study how augmented OOD data based on sampling impact OOD utterance detection with a small sample size. We hypothesize that OOD utterance samples chosen randomly can increase the coverage of unknown OOD utterance space and enhance detection accuracy if they are more dispersed. Experiments show that given the same dataset with the same OOD sample size, the OOD utterance detection performance improves when OOD samples are more spread-out.

READ FULL TEXT
research
07/04/2018

Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding

In this paper, we study the problem of data augmentation for language un...
research
05/24/2019

Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation

Neural dialog models often lack robustness to anomalous user input and p...
research
12/07/2020

Using previous acoustic context to improve Text-to-Speech synthesis

Many speech synthesis datasets, especially those derived from audiobooks...
research
09/14/2022

Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset

Non-reference speech quality models are important for a growing number o...
research
03/08/2020

Pseudo Labeling and Negative Feedback Learning for Large-scale Multi-label Domain Classification

In large-scale domain classification, an utterance can be handled by mul...
research
02/24/2022

Self-Attention for Incomplete Utterance Rewriting

Incomplete utterance rewriting (IUR) has recently become an essential ta...
research
10/20/2022

Meeting Decision Tracker: Making Meeting Minutes with De-Contextualized Utterances

Meetings are a universal process to make decisions in business and proje...

Please sign up or login with your details

Forgot password? Click here to reset