DeepAI AI Chat
Log In Sign Up

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

by   Xiyang Hu, et al.

Multilingual information retrieval is challenging due to the lack of training datasets for many low-resource languages. We present an effective method by leveraging parallel and non-parallel corpora to improve the pretrained multilingual language models' cross-lingual transfer ability for information retrieval. We design the semantic contrastive loss as regular contrastive learning to improve the cross-lingual alignment of parallel sentence pairs, and we propose a new contrastive loss, the language contrastive loss, to leverage both parallel corpora and non-parallel corpora to further improve multilingual representation learning. We train our model on an English information retrieval dataset, and test its zero-shot transfer ability to other languages. Our experiment results show that our method brings significant improvement to prior work on retrieval performance, while it requires much less computational effort. Our model can work well even with a small number of parallel corpora. And it can be used as an add-on module to any backbone and other tasks. Our code is available at:


page 1

page 2

page 3

page 4


English Contrastive Learning Can Learn Universal Cross-lingual Sentence Embeddings

Universal cross-lingual sentence embeddings map semantically similar cro...

A Multilingual Parallel Corpora Collection Effort for Indian Languages

We present sentence aligned parallel corpora across 10 Indian Languages ...

Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

Contrastive learning has been successfully used for retrieval of semanti...

Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning

Contrastive vision-language models (e.g. CLIP) are typically created by ...

Does Transliteration Help Multilingual Language Modeling?

As there is a scarcity of large representative corpora for most language...

Multilingual Search with Subword TF-IDF

Multilingual search can be achieved with subword tokenization. The accur...