On the ability of monolingual models to learn language-agnostic representations

Pretrained multilingual models have become a de facto default approach for zero-shot cross-lingual transfer. Previous work has shown that these models are able to achieve cross-lingual representations when pretrained on two or more languages with shared parameters. In this work, we provide evidence that a model can achieve language-agnostic representations even when pretrained on a single language. That is, we find that monolingual models pretrained and finetuned on different languages achieve competitive performance compared to the ones that use the same target language. Surprisingly, the models show a similar performance on a same task regardless of the pretraining language. For example, models pretrained on distant languages such as German and Portuguese perform similarly on English tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2019

Can Monolingual Pretrained Models Help Cross-Lingual Classification?

Multilingual pretrained language models (such as multilingual BERT) have...
research
05/30/2021

Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking

Injecting external domain-specific knowledge (e.g., UMLS) into pretraine...
research
06/14/2021

Modeling Profanity and Hate Speech in Social Media with Semantic Subspaces

Hate speech and profanity detection suffer from data sparsity, especiall...
research
09/23/2018

Towards Language Agnostic Universal Representations

When a bilingual student learns to solve word problems in math, we expec...
research
09/22/2022

MonoByte: A Pool of Monolingual Byte-level Language Models

The zero-shot cross-lingual ability of models pretrained on multilingual...
research
04/17/2022

Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models

English pretrained language models, which make up the backbone of many m...
research
04/18/2018

Experiments with Universal CEFR Classification

The Common European Framework of Reference (CEFR) guidelines describe la...

Please sign up or login with your details

Forgot password? Click here to reset