How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

06/16/2020
by   Steffen Eger, et al.
0

Sentence encoders map sentences to real valued vectors for use in downstream applications. To peek into these representations - e.g., to increase interpretability of their results - probing tasks have been designed which query them for linguistic knowledge. However, designing probing tasks for lesser-resourced languages is tricky, because these often lack large-scale annotated data or (high-quality) dependency parsers as a prerequisite of probing task design in English. To investigate how to probe sentence embeddings in such cases, we investigate sensitivity of probing task results to structural design choices, conducting the first such large scale study. We show that design choices like size of the annotated probing dataset and type of classifier used for evaluation do (sometimes substantially) influence probing outcomes. We then probe embeddings in a multilingual setup with design choices that lie in a 'stable region', as we identify for English, and find that results on English do not transfer to other languages. Fairer and more comprehensive sentence-level probing evaluation should thus be carried out on multiple languages in the future.

READ FULL TEXT
research
06/12/2019

Probing Multilingual Sentence Representations With X-Probe

This paper extends the task of probing sentence representations for ling...
research
10/25/2019

Evaluation of Sentence Representations in Polish

Methods for learning sentence representations have been actively develop...
research
06/12/2021

Exploiting Parallel Corpora to Improve Multilingual Embedding based Document and Sentence Alignment

Multilingual sentence representations pose a great advantage for low-res...
research
07/26/2022

Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases

Sentence embeddings are commonly used in text clustering and semantic re...
research
03/22/2019

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Despite an ever growing number of word representation models introduced ...
research
09/08/2019

Designing and Interpreting Probes with Control Tasks

Probes, supervised models trained to predict properties (like parts-of-s...
research
06/19/2019

Learning Compressed Sentence Representations for On-Device Text Processing

Vector representations of sentences, trained on massive text corpora, ar...

Please sign up or login with your details

Forgot password? Click here to reset