Lost in Embedding Space: Explaining Cross-Lingual Task Performance with Eigenvalue Divergence

01/30/2020
by   Haim Dubossarsky, et al.
0

Performance in cross-lingual NLP tasks is impacted by the (dis)similarity of languages at hand: e.g., previous work has suggested there is a connection between the expected success of bilingual lexicon induction (BLI) and the assumption of (approximate) isomorphism between monolingual embedding spaces. In this work, we present a large-scale study focused on the correlations between language similarity and task performance, covering thousands of language pairs and four different tasks: BLI, machine translation, parsing, and POS tagging. We propose a novel language distance measure, Eigenvalue Divergence (EVD), which quantifies the degree of isomorphism between two monolingual spaces. We empirically show that 1) language similarity scores derived from embedding-based EVD distances are strongly associated with performance observed in different cross-lingual tasks, 2) EVD outperforms other standard embedding-based language distance measures across the board, at the same time being computationally more tractable and easier to interpret. Finally, we demonstrate that EVD captures information which is complementary to typologically driven language distance measures. We report that their combination yields even higher correlations with performance levels in all cross-lingual tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2017

Limitations of Cross-Lingual Learning from Image Search

Cross-lingual representation learning is an important step in making NLP...
research
05/02/2018

Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only

We propose a fully unsupervised framework for ad-hoc cross-lingual infor...
research
01/19/2018

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Recognizing semantically similar sentences or paragraphs across language...
research
07/11/2018

Linear Transformations for Cross-lingual Semantic Textual Similarity

Cross-lingual semantic textual similarity systems estimate the degree of...
research
06/23/2019

Cross-lingual Data Transformation and Combination for Text Classification

Text classification is a fundamental task for text data mining. In order...
research
05/06/2020

A Multi-Perspective Architecture for Semantic Code Search

The ability to match pieces of code to their corresponding natural langu...
research
05/16/2022

Towards Debiasing Translation Artifacts

Cross-lingual natural language processing relies on translation, either ...

Please sign up or login with your details

Forgot password? Click here to reset