Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

09/03/2019
by   Haoyang Huang, et al.
0

We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages. Comparing to similar efforts such as Multilingual BERT and XLM, three new cross-lingual pre-training tasks are proposed, including cross-lingual word recovery, cross-lingual paraphrase classification and cross-lingual masked language model. These tasks help Unicoder learn the mappings among different languages from more perspectives. We also find that doing fine-tuning on multiple languages together can bring further improvement. Experiments are performed on two tasks: cross-lingual natural language inference (XNLI) and cross-lingual question answering (XQA), where XLM is our baseline. On XNLI, 1.8 languages) is obtained. On XQA, which is a new cross-lingual dataset built by us, 5.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2021

XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

In this paper, we introduce ELECTRA-style tasks to cross-lingual languag...
research
04/03/2020

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

In this paper, we introduce XGLUE, a new benchmark dataset to train larg...
research
05/06/2023

Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval

Effective cross-lingual dense retrieval methods that rely on multilingua...
research
05/22/2023

Machine-Created Universal Language for Cross-lingual Transfer

There are two types of approaches to solving cross-lingual transfer: mul...
research
12/24/2020

Cross-lingual Dependency Parsing as Domain Adaptation

In natural language processing (NLP), cross-lingual transfer learning is...
research
06/03/2021

Syntax-augmented Multilingual BERT for Cross-lingual Transfer

In recent years, we have seen a colossal effort in pre-training multilin...
research
10/07/2020

Cross-lingual Extended Named Entity Classification of Wikipedia Articles

The FPT.AI team participated in the SHINRA2020-ML subtask of the NTCIR-1...

Please sign up or login with your details

Forgot password? Click here to reset