Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

09/03/2021
by   Yingmei Guo, et al.
0

Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2019

A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages

Parsers are available for only a handful of the world's languages, since...
research
05/15/2021

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

The lack of publicly available evaluation data for low-resource language...
research
05/08/2018

Capsule Networks for Low Resource Spoken Language Understanding

Designing a spoken language understanding system for command-and-control...
research
10/06/2020

Textual Supervision for Visually Grounded Spoken Language Understanding

Visually-grounded models of spoken language understanding extract semant...
research
05/07/2022

Multi-level Contrastive Learning for Cross-lingual Spoken Language Understanding

Although spoken language understanding (SLU) has achieved great success ...
research
08/19/2022

Effective Transfer Learning for Low-Resource Natural Language Understanding

Natural language understanding (NLU) is the task of semantic decoding of...
research
12/21/2020

Pattern-aware Data Augmentation for Query Rewriting in Voice Assistant Systems

Query rewriting (QR) systems are widely used to reduce the friction caus...

Please sign up or login with your details

Forgot password? Click here to reset