MultiMix: A Robust Data Augmentation Strategy for Cross-Lingual NLP

04/28/2020
by   M Saiful Bari, et al.
0

Transfer learning has yielded state-of-the-art results in many supervised natural language processing tasks. However, annotated data for every target task in every target language is rare, especially for low-resource languages. In this work, we propose MultiMix, a novel data augmentation method for semi-supervised learning in zero-shot transfer learning scenarios. In particular, MultiMix targets to solve cross-lingual adaptation problems from a source (language) distribution to an unknown target (language) distribution assuming it has no training labels in the target language task. In its heart, MultiMix performs simultaneous self-training with data augmentation and unsupervised sample selection. To show its effectiveness, we have performed extensive experiments on zero-shot transfers for cross-lingual named entity recognition (XNER) and natural language inference (XNLI). Our experiments show sizeable improvements in both tasks outperforming the baselines by a good margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

While natural language processing systems often focus on a single langua...
research
09/11/2023

Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity Recognition

Transfer learning has led to large gains in performance for nearly all N...
research
01/30/2020

Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages

Most combinations of NLP tasks and language varieties lack in-domain exa...
research
04/29/2022

Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

The current state-of-the-art for few-shot cross-lingual transfer learnin...
research
03/05/2023

WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks

Intimacy is an essential element of human relationships and language is ...
research
02/27/2022

Variational Autoencoder with Disentanglement Priors for Low-Resource Task-Specific Natural Language Generation

In this paper, we propose a variational autoencoder with disentanglement...
research
09/14/2021

Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction

Zero-shot cross-lingual information extraction (IE) describes the constr...

Please sign up or login with your details

Forgot password? Click here to reset