Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding

04/21/2018
by   Lifu Huang, et al.
0

We construct a multilingual common semantic space based on distributional semantics, where words from multiple languages are projected into a shared space to enable knowledge and resource transfer across languages. Beyond word alignment, we introduce multiple cluster-level alignments and enforce the word clusters to be consistently distributed across multiple languages. We exploit three signals for clustering: (1) neighbor words in the monolingual word embedding space; (2) character-level information; and (3) linguistic properties (e.g., apposition, locative suffix) derived from linguistic structure knowledge bases available for thousands of languages. We introduce a new cluster-consistent correlational neural network to construct the common semantic space by aligning words as well as clusters. Intrinsic evaluation on monolingual and multilingual QVEC tasks shows our approach achieves significantly higher correlation with linguistic features than state-of-the-art multi-lingual embedding learning methods do. Using low-resource language name tagging as a case study for extrinsic evaluation, our approach achieves up to 24.5% absolute F-score gain over the state of the art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2018

Unsupervised Cross-lingual Transfer of Word Embedding Spaces

Cross-lingual transfer of word embeddings aims to establish the semantic...
research
01/17/2020

A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

This paper presents a new technique for creating monolingual and cross-l...
research
10/24/2019

Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations

Previous work on cross-lingual sequence labeling tasks either requires p...
research
10/09/2021

An Isotropy Analysis in the Multilingual BERT Embedding Space

Several studies have explored various advantages of multilingual pre-tra...
research
07/11/2018

Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces

We generalize the word analogy task across languages, to provide a new i...
research
06/02/2021

A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space

The representation degeneration problem in Contextual Word Representatio...
research
07/29/2019

A Mathematical Model for Linguistic Universals

Inspired by chemical kinetics and neurobiology, we propose a mathematica...

Please sign up or login with your details

Forgot password? Click here to reset