Linear Cross-Lingual Mapping of Sentence Embeddings

05/23/2023
by   Oleg Vasilyev, et al.
0

Semantics of a sentence is defined with much less ambiguity than semantics of a single word, and it should be better preserved by translation to another language. If multilingual sentence embeddings intend to represent sentence semantics, then the similarity between embeddings of any two sentences must be invariant with respect to translation. Based on this suggestion, we consider a simple linear cross-lingual mapping as a possible improvement of the multilingual embeddings. We also consider deviation from orthogonality conditions as a measure of deficiency of the embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2019

Context-Aware Cross-Lingual Mapping

Cross-lingual word vectors are typically obtained by fitting an orthogon...
research
09/15/2019

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

We present Emu, a system that semantically enhances multilingual sentenc...
research
04/28/2023

Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?

Dense vector representations for textual data are crucial in modern NLP....
research
09/30/2019

Simple and Effective Paraphrastic Similarity from Parallel Translations

We present a model and methodology for learning paraphrastic sentence em...
research
06/17/2022

Statistical and Neural Methods for Cross-lingual Entity Label Mapping in Knowledge Graphs

Knowledge bases such as Wikidata amass vast amounts of named entity info...
research
06/25/2021

ParaLaw Nets – Cross-lingual Sentence-level Pretraining for Legal Text Processing

Ambiguity is a characteristic of natural language, which makes expressio...
research
05/30/2023

Research on Multilingual News Clustering Based on Cross-Language Word Embeddings

Classifying the same event reported by different countries is of signifi...

Please sign up or login with your details

Forgot password? Click here to reset