AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

04/17/2021
by   Qianchu Liu, et al.
0

Capturing word meaning in context and distinguishing between correspondences and variations across languages is key to building successful multilingual and cross-lingual text representation models. However, existing multilingual evaluation datasets that evaluate lexical semantics "in-context" have various limitations, in particular, (1) their language coverage is restricted to high-resource languages and skewed in favor of only a few language families and areas, (2) a design that makes the task solvable via superficial cues, which results in artificially inflated (and sometimes super-human) performances of pretrained encoders, on many target languages, which limits their usefulness for model probing and diagnostics, and (3) no support for cross-lingual evaluation. In order to address these gaps, we present AM2iCo, Adversarial and Multilingual Meaning in Context, a wide-coverage cross-lingual and multilingual evaluation set; it aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts for 14 language pairs. We conduct a series of experiments in a wide range of setups and demonstrate the challenging nature of AM2iCo. The results reveal that current SotA pretrained encoders substantially lag behind human performance, and the largest gaps are observed for low-resource languages and languages dissimilar to English.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2020

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Much recent progress in applications of machine learning models to NLP h...
research
11/14/2015

Learning to Represent Words in Context with Multilingual Supervision

We present a neural network architecture based on bidirectional LSTMs to...
research
08/01/2022

BabelBERT: Massively Multilingual Transformers Meet a Massively Multilingual Lexical Resource

While pretrained language models (PLMs) primarily serve as general purpo...
research
04/21/2021

PALI at SemEval-2021 Task 2: Fine-Tune XLM-RoBERTa for Word in Context Disambiguation

This paper presents the PALI team's winning system for SemEval-2021 Task...
research
04/28/2020

Synonymy = Translational Equivalence

Synonymy and translational equivalence are the relations of sameness of ...
research
10/13/2020

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

The ability to correctly model distinct meanings of a word is crucial fo...
research
08/10/2022

The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network

A sememe is defined as the minimum semantic unit of human languages. Sem...

Please sign up or login with your details

Forgot password? Click here to reset