Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

12/21/2022
by   John Wieting, et al.
0

Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning multilingual text embeddings which can be used to retrieve or score sentence pairs. Our model operates on parallel data in N languages and, through an approximation we introduce, efficiently encourages source separation in this multilingual setting, separating semantic information that is shared between translations from stylistic or language-specific variation. We show careful large-scale comparisons between contrastive and generation-based approaches for learning multilingual text embeddings, a comparison that has not been done to the best of our knowledge despite the popularity of these approaches. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval – the last of which we introduce in this paper. Overall, our Variational Multilingual Source-Separation Transformer (VMSST) model outperforms both a strong contrastive and generative baseline on these tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2022

Multilingual Representation Distillation with Contrastive Learning

Multilingual sentence representations from large models can encode seman...
research
10/12/2022

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Multilingual information retrieval is challenging due to the lack of tra...
research
05/09/2022

EASE: Entity-Aware Contrastive Learning of Sentence Embedding

We present EASE, a novel method for learning sentence embeddings via con...
research
09/15/2019

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

We present Emu, a system that semantically enhances multilingual sentenc...
research
11/10/2019

A Bilingual Generative Transformer for Semantic Sentence Embedding

Semantic sentence embedding models encode natural language sentences int...
research
04/18/2022

GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Due to high data demands of current methods, attention to zero-shot cros...
research
04/30/2019

Model Comparison for Semantic Grouping

We introduce a probabilistic framework for quantifying the semantic simi...

Please sign up or login with your details

Forgot password? Click here to reset