In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages

02/23/2023
by   Asım Ersoy, et al.
0

Multilingual generative language models (LMs) are increasingly fluent in a large variety of languages. Trained on the concatenation of corpora in multiple languages, they enable powerful transfer from high-resource languages to low-resource ones. However, it is still unknown what cultural biases are induced in the predictions of these models. In this work, we focus on one language property highly influenced by culture: formality. We analyze the formality distributions of XGLM and BLOOM's predictions, two popular generative multilingual language models, in 5 languages. We classify 1,200 generations per language as formal, informal, or incohesive and measure the impact of the prompt formality on the predictions. Overall, we observe a diversity of behaviors across the models and languages. For instance, XGLM generates informal text in Arabic and Bengali when conditioned with informal prompts, much more than BLOOM. In addition, even though both models are highly biased toward the formal style when prompted neutrally, we find that the models generate a significant amount of informal predictions even when prompted with formal text. We release with this work 6,000 annotated samples, paving the way for future work on the formality of generative multilingual LMs.

READ FULL TEXT
research
05/01/2020

Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi

Building natural language processing systems for non standardized and lo...
research
05/13/2023

The Geometry of Multilingual Language Models: An Equality Lens

Understanding the representations of different languages in multilingual...
research
07/03/2023

Multilingual Language Models are not Multicultural: A Case Study in Emotion

Emotions are experienced and expressed differently across the world. In ...
research
10/24/2020

When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models

Transfer learning based on pretraining language models on a large amount...
research
06/07/2021

Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study

Recent research in multilingual language models (LM) has demonstrated th...
research
07/16/2019

Modeling competitive evolution of multiple languages

Increasing evidence demonstrates that in many places language coexistenc...
research
05/03/2023

evaluating bert and parsbert for analyzing persian advertisement data

This paper discusses the impact of the Internet on modern trading and th...

Please sign up or login with your details

Forgot password? Click here to reset