English Contrastive Learning Can Learn Universal Cross-lingual Sentence Embeddings

11/11/2022
by   Yau-Shian Wang, et al.
0

Universal cross-lingual sentence embeddings map semantically similar cross-lingual sentences into a shared embedding space. Aligning cross-lingual sentence embeddings usually requires supervised cross-lingual parallel sentences. In this work, we propose mSimCSE, which extends SimCSE to multilingual settings and reveal that contrastive learning on English data can surprisingly learn high-quality universal cross-lingual sentence embeddings without any parallel data. In unsupervised and weakly supervised settings, mSimCSE significantly improves previous sentence embedding methods on cross-lingual retrieval and multilingual STS tasks. The performance of unsupervised mSimCSE is comparable to fully supervised methods in retrieving low-resource languages and multilingual STS. The performance can be further enhanced when cross-lingual NLI data is available. Our code is publicly available at https://github.com/yaushian/mSimCSE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2023

Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval

Recently multi-lingual pre-trained language models (PLM) such as mBERT a...
research
10/10/2022

Multilingual Representation Distillation with Contrastive Learning

Multilingual sentence representations from large models can encode seman...
research
05/09/2022

EASE: Entity-Aware Contrastive Learning of Sentence Embedding

We present EASE, a novel method for learning sentence embeddings via con...
research
04/08/2020

Cross-lingual Emotion Intensity Prediction

Emotion intensity prediction determines the degree or intensity of an em...
research
10/19/2020

The RELX Dataset and Matching the Multilingual Blanks for Cross-Lingual Relation Classification

Relation classification is one of the key topics in information extracti...
research
05/23/2023

Detecting and Mitigating Hallucinations in Multilingual Summarisation

Hallucinations pose a significant challenge to the reliability of neural...
research
09/16/2023

Leveraging Multi-lingual Positive Instances in Contrastive Learning to Improve Sentence Embedding

Learning multi-lingual sentence embeddings is a fundamental and signific...

Please sign up or login with your details

Forgot password? Click here to reset