Interactive Refinement of Cross-Lingual Word Embeddings

11/08/2019
by   Michelle Yuan, et al.
0

Cross-lingual word embeddings transfer knowledge between languages: models trained for a high-resource language can be used in a low-resource language. These embeddings are usually trained on general-purpose corpora but used for a domain-specific task. We introduce CLIME, an interactive system that allows a user to quickly adapt cross-lingual word embeddings for a given classification problem. First, words in the vocabulary are ranked by their salience to the downstream task. Then, salient keywords are displayed on an interface. Users mark the similarity between each keyword and its nearest neighbors in the embedding space. Finally, CLIME updates the embeddings using the annotations. We experiment clime on a cross-lingual text classification benchmark for four low-resource languages: Ilocano, Sinhalese, Tigrinya, and Uyghur. Embeddings refined by CLIME capture more nuanced word semantics and have higher test accuracy than the original embeddings. CLIME also improves test accuracy faster than an active learning baseline, and a simple combination of CLIME with active learning has the highest test accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2017

Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary

Cross-lingual model transfer is a compelling and popular method for pred...
research
06/05/2019

A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity

Cross-lingual word embeddings encode the meaning of words from different...
research
05/17/2020

Cross-Lingual Word Embeddings for Turkic Languages

There has been an increasing interest in learning cross-lingual word emb...
research
03/23/2019

Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Most work in text classification and Natural Language Processing (NLP) f...
research
09/09/2021

Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph

In cross-lingual text classification, it is required that task-specific ...
research
04/10/2023

On Evaluation of Bangla Word Analogies

This paper presents a high-quality dataset for evaluating the quality of...
research
06/02/2021

Evaluating Word Embeddings with Categorical Modularity

We introduce categorical modularity, a novel low-resource intrinsic metr...

Please sign up or login with your details

Forgot password? Click here to reset