Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models

02/01/2021
by   Nora Kassner, et al.
0

Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowledge base? Most prior work only considers English. Extending research to multiple languages is important for diversity and accessibility. (ii) Is mBERT's performance as knowledge base language-independent or does it vary from language to language? (iii) A multilingual model is trained on more text, e.g., mBERT is trained on 104 Wikipedias. Can mBERT leverage this for better performance? We find that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance. Conversely, mBERT exhibits a language bias; e.g., when queried in Italian, it tends to predict Italy as the country of origin.

READ FULL TEXT
research
03/22/2022

Factual Consistency of Multilingual Pretrained Language Models

Pretrained language models can be queried for factual knowledge, with po...
research
04/14/2021

Static Embeddings as Efficient Knowledge Bases?

Recent research investigates factual knowledge stored in large pretraine...
research
05/31/2023

Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios

We tackle the task of automatically discriminating between human and mac...
research
11/13/2022

mOKB6: A Multilingual Open Knowledge Base Completion Benchmark

Automated completion of open knowledge bases (KBs), which are constructe...
research
06/08/2023

DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models

A few benchmarking datasets have been released to evaluate the factual k...
research
07/01/2021

Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets

Language resources are necessary for language processing,but building th...
research
07/07/2023

Testing the Predictions of Surprisal Theory in 11 Languages

A fundamental result in psycholinguistics is that less predictable words...

Please sign up or login with your details

Forgot password? Click here to reset