XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

10/13/2020
by   Alessandro Raganato, et al.
0

The ability to correctly model distinct meanings of a word is crucial for the effectiveness of semantic representation techniques. However, most existing evaluation benchmarks for assessing this criterion are tied to sense inventories (usually WordNet), restricting their usage to a small subset of knowledge-based representation techniques. The Word-in-Context dataset (WiC) addresses the dependence on sense inventories by reformulating the standard disambiguation task as a binary classification problem; but, it is limited to the English language. We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages from varied language families and with different degrees of resource availability, opening room for evaluation scenarios such as zero-shot cross-lingual transfer. We perform a series of experiments to determine the reliability of the datasets and to set performance baselines for several recent contextualized multilingual models. Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance in the task of distinguishing different meanings of a word, even for distant languages. XL-WiC is available at https://pilehvar.github.io/xlwic/.

READ FULL TEXT

page 3

page 4

page 6

page 7

page 8

page 10

page 11

page 13

research
04/26/2023

Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models

Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and...
research
05/03/2021

Scalar Adjective Identification and Multilingual Ranking

The intensity relationship that holds between scalar adjectives (e.g., n...
research
10/13/2020

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Language models (LMs) have proven surprisingly successful at capturing f...
research
04/30/2020

WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context

In this paper, we present WiC-TSV (Target Sense Verification for Words i...
research
04/17/2021

AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

Capturing word meaning in context and distinguishing between corresponde...
research
02/22/2023

Impact of Subword Pooling Strategy on Cross-lingual Event Detection

Pre-trained multilingual language models (e.g., mBERT, XLM-RoBERTa) have...
research
06/25/2021

Exploring the Representation of Word Meanings in Context: A Case Study on Homonymy and Synonymy

This paper presents a multilingual study of word meaning representations...

Please sign up or login with your details

Forgot password? Click here to reset