GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

05/24/2022
by   Da Yin, et al.
0

Recent work has shown that Pre-trained Language Models (PLMs) have the ability to store the relational knowledge from pre-training data in their model parameters. However, it is not clear up to what extent do PLMs store geo-diverse commonsense knowledge, the knowledge associated with a culture and only shared locally. For instance, the color of bridal dress is white in American weddings whereas it is red in Chinese weddings. Here, we wish to probe if PLMs can predict red and white as the color of the bridal dress when queried for American and Chinese weddings, respectively. To this end, we introduce a framework for geo-diverse commonsense probing on multilingual PLMs (mPLMs) and introduce a corresponding benchmark Geo-diverse Commonsense Multilingual Language Model Analysis (GeoMLAMA) dataset. GeoMLAMA contains 3125 prompts in English, Chinese, Hindi, Persian, and Swahili, with a wide coverage of concepts shared by people from American, Chinese, Indian, Iranian and Kenyan cultures. We benchmark 11 standard mPLMs which include variants of mBERT, XLM, mT5, and XGLM on GeoMLAMA. Interestingly, we find that 1) larger mPLM variants do not necessarily store geo-diverse concepts better than its smaller variant; 2) mPLMs are not intrinsically biased towards knowledge from the Western countries (the United States); 3) the native language of a country may not be the best language to probe its knowledge and 4) a language may better probe knowledge about a non-native country than its native country.

READ FULL TEXT
research
10/16/2021

Leveraging Knowledge in Multilingual Commonsense Reasoning

Commonsense reasoning (CSR) requires the model to be equipped with gener...
research
11/29/2020

Intrinsic Knowledge Evaluation on Chinese Language Models

Recent NLP tasks have benefited a lot from pre-trained language models (...
research
09/02/2019

Commonsense Knowledge Mining from Pretrained Models

Inferring commonsense knowledge is a key challenge in natural language p...
research
05/26/2022

Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach

Pre-trained models (PTMs) have lead to great improvements in natural lan...
research
08/05/2020

Aligning AI With Shared Human Values

We show how to assess a language model's knowledge of basic concepts of ...
research
11/15/2022

Relationship of the language distance to English ability of a country

Language difference is one of the factors that hinder the acquisition of...
research
10/12/2021

ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for Nouns' Semantic Properties and their Prototypicality

Large scale language models encode rich commonsense knowledge acquired t...

Please sign up or login with your details

Forgot password? Click here to reset