Log In Sign Up

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

by   Zhengbao Jiang, et al.

Language models (LMs) have proven surprisingly successful at capturing factual knowledge by completing cloze-style fill-in-the-blank questions such as "Punta Cana is located in _." However, while knowledge is both written and queried in many languages, studies on LMs' factual representation ability have almost invariably been performed on English. To assess factual knowledge retrieval in LMs in different languages, we create a multilingual benchmark of cloze-style probes for typologically diverse languages. To properly handle language variations, we expand probing methods from single- to multi-word entities, and develop several decoding algorithms to generate multi-token predictions. Extensive experimental results provide insights about how well (or poorly) current state-of-the-art LMs perform at this task in languages with more or fewer available resources. We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages. Benchmark data and code have been released at


page 1

page 2

page 3

page 4


Language Models are Multilingual Chain-of-Thought Reasoners

We evaluate the reasoning abilities of large language models in multilin...

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

The ability to correctly model distinct meanings of a word is crucial fo...

A multilabel approach to morphosyntactic probing

We introduce a multilabel probing task to assess the morphosyntactic rep...

CodeSwitch-Reddit: Exploration of Written Multilingual Discourse in Online Discussion Forums

In contrast to many decades of research on oral code-switching, the stud...

How Can We Know What Language Models Know?

Recent work has presented intriguing results examining the knowledge con...

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Code-switching (CS), a ubiquitous phenomenon due to the ease of communic...