Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast

09/01/2021
by   Liang Wang, et al.
0

In this paper, we propose to align sentence representations from different languages into a unified embedding space, where semantic similarities (both cross-lingual and monolingual) can be computed with a simple dot product. Pre-trained language models are fine-tuned with the translation ranking task. Existing work (Feng et al., 2020) uses sentences within the same batch as negatives, which can suffer from the issue of easy negatives. We adapt MoCo (He et al., 2020) to further improve the quality of alignment. As the experimental results show, the sentence representations produced by our model achieve the new state-of-the-art on several tasks, including Tatoeba en-zh similarity search (Artetxe and Schwenk, 2019b), BUCC en-zh bitext mining, and semantic textual similarity on 7 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity

Previous work has shown that the representations output by contextual la...
research
12/31/2020

ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora

Recent studies have demonstrated that pre-trained cross-lingual models a...
research
06/12/2023

Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization

Multilingual sentence representations are the foundation for similarity-...
research
10/15/2020

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

Pre-trained cross-lingual encoders such as mBERT (Devlin et al., 2019) a...
research
04/30/2021

Paraphrastic Representations at Scale

We present a system that allows users to train their own state-of-the-ar...
research
04/08/2021

A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders

Powerful sentence encoders trained for multiple languages are on the ris...
research
03/16/2020

HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment

Twenty-five years ago, morphologically aligned Hebrew-Finnish and Greek-...

Please sign up or login with your details

Forgot password? Click here to reset