LINSPECTOR: Multilingual Probing Tasks for Word Representations

03/22/2019
by   Gözde Gül Şahin, et al.
0

Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the community to get an estimate of the downstream task performance, as well as to design more informed neural architectures, while avoiding extensive experimentation which requires substantial computational resources not all researchers have access to. A recent development in NLP is to use simple classification tasks, also called probing tasks, that test for a single linguistic feature such as part-of-speech. Existing studies mostly focus on exploring the information encoded by the sentence-level representations for English. However, from a typological perspective the morphologically poor English is rather an outlier: the information encoded by the word order and function words in English is often stored on a subword, morphological level in other languages. To address this, we introduce 15 word-level probing tasks such as case marking, possession, word length, morphological tag count and pseudoword identification for 24 languages. We present experiments on several state of the art word embedding models, in which we relate the probing task performance for a diverse set of languages to a range of classic NLP tasks such as semantic role labeling and natural language inference. We find that a number of probing tests have significantly high positive correlation to the downstream tasks, especially for morphologically rich languages. We show that our tests can be used to explore word embeddings or black-box neural models for linguistic cues in a multilingual setting. We release the probing datasets and the evaluation suite with https://github.com/UKPLab/linspector.

READ FULL TEXT
research
07/26/2019

LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations

We present LINSPECTOR WEB, an open source multilingual inspector to anal...
research
04/30/2020

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

Word embeddings are powerful representations that form the foundation of...
research
03/11/2021

Evaluation of Morphological Embeddings for the Russian Language

A number of morphology-based word embedding models were introduced in re...
research
02/22/2021

Subword Pooling Makes a Difference

Contextual word-representations became a standard in modern natural lang...
research
06/29/2020

Measuring Memorization Effect in Word-Level Neural Networks Probing

Multiple studies have probed representations emerging in neural networks...
research
04/16/2019

A Systematic Study of Leveraging Subword Information for Learning Word Representations

The use of subword-level information (e.g., characters, character n-gram...
research
06/16/2020

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Sentence encoders map sentences to real valued vectors for use in downst...

Please sign up or login with your details

Forgot password? Click here to reset