Pattern-aware Data Augmentation for Query Rewriting in Voice Assistant Systems

12/21/2020
by   Yunmo Chen, et al.
0

Query rewriting (QR) systems are widely used to reduce the friction caused by errors in a spoken language understanding pipeline. However, the underlying supervised models require a large number of labeled pairs, and these pairs are hard and costly to be collected. Therefore, We propose an augmentation framework that learns patterns from existing training pairs and generates rewrite candidates from rewrite labels inversely to compensate for insufficient QR training data. The proposed framework casts the augmentation problem as a sequence-to-sequence generation task and enforces the optimization process with a policy gradient technique for controllable rewarding. This approach goes beyond the traditional heuristics or rule-based augmentation methods and is not constrained to generate predefined patterns of swapping/replacing words. Our experimental results show its effectiveness compared with a fully trained QR baseline and demonstrate its potential application in boosting the QR performance on low-resource domains or locales.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2020

Pre-Training for Query Rewriting in A Spoken Language Understanding System

Query rewriting (QR) is an increasingly important technique to reduce cu...
research
01/06/2020

Mel-spectrogram augmentation for sequence to sequence voice conversion

When training the sequence-to-sequence voice conversion model, we need t...
research
08/19/2021

Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models

Spoken Language Understanding (SLU) is one essential step in building a ...
research
04/30/2020

Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation

Aspect term extraction aims to extract aspect terms from review texts as...
research
01/28/2021

Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources

We propose a novel hybrid approach to lemmatization that enhances the se...
research
09/03/2021

Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Lack of training data presents a grand challenge to scaling out spoken l...
research
09/07/2021

GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation

Practical dialogue systems require robust methods of detecting out-of-sc...

Please sign up or login with your details

Forgot password? Click here to reset