Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

09/14/2022
by   Suhyune Son, et al.
0

As pre-trained language models become more resource-demanding, the inequality between resource-rich languages such as English and resource-scarce languages is worsening. This can be attributed to the fact that the amount of available training data in each language follows the power-law distribution, and most of the languages belong to the long tail of the distribution. Some research areas attempt to mitigate this problem. For example, in cross-lingual transfer learning and multilingual training, the goal is to benefit long-tail languages via the knowledge acquired from resource-rich languages. Although being successful, existing work has mainly focused on experimenting on as many languages as possible. As a result, targeted in-depth analysis is mostly absent. In this study, we focus on a single low-resource language and perform extensive evaluation and probing experiments using cross-lingual post-training (XPT). To make the transfer scenario challenging, we choose Korean as the target language, as it is a language isolate and thus shares almost no typology with English. Results show that XPT not only outperforms or performs on par with monolingual models trained with orders of magnitudes more data but also is highly efficient in the transfer process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2023

Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese

Multilingual language models have pushed state-of-the-art in cross-lingu...
research
01/04/2021

Transformers and Transfer Learning for Improving Portuguese Semantic Role Labeling

Semantic Role Labeling (SRL) is a core Natural Language Processing task....
research
07/01/2023

Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

Developing effective spoken language processing systems for low-resource...
research
06/13/2023

Soft Language Clustering for Multilingual Model Pre-training

Multilingual pre-trained language models have demonstrated impressive (z...
research
05/25/2022

Language Anisotropic Cross-Lingual Model Editing

Pre-trained language models learn large amounts of knowledge from their ...
research
03/26/2022

Metaphors in Pre-Trained Language Models: Probing and Generalization Across Datasets and Languages

Human languages are full of metaphorical expressions. Metaphors help peo...
research
10/25/2022

Progressive Sentiment Analysis for Code-Switched Text Data

Multilingual transformer language models have recently attracted much at...

Please sign up or login with your details

Forgot password? Click here to reset