Language Agnostic Multilingual Information Retrieval with Contrastive Learning

10/12/2022
by   Xiyang Hu, et al.
0

Multilingual information retrieval is challenging due to the lack of training datasets for many low-resource languages. We present an effective method by leveraging parallel and non-parallel corpora to improve the pretrained multilingual language models' cross-lingual transfer ability for information retrieval. We design the semantic contrastive loss as regular contrastive learning to improve the cross-lingual alignment of parallel sentence pairs, and we propose a new contrastive loss, the language contrastive loss, to leverage both parallel corpora and non-parallel corpora to further improve multilingual representation learning. We train our model on an English information retrieval dataset, and test its zero-shot transfer ability to other languages. Our experiment results show that our method brings significant improvement to prior work on retrieval performance, while it requires much less computational effort. Our model can work well even with a small number of parallel corpora. And it can be used as an add-on module to any backbone and other tasks. Our code is available at: https://github.com/xiyanghu/multilingualIR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2023

Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer

Zero-shot cross-lingual transfer is a central task in multilingual NLP, ...
research
04/12/2018

Learning Multilingual Embeddings for Cross-Lingual Information Retrieval in the Presence of Topically Aligned Corpora

Cross-lingual information retrieval is a challenging task in the absence...
research
07/15/2020

A Multilingual Parallel Corpora Collection Effort for Indian Languages

We present sentence aligned parallel corpora across 10 Indian Languages ...
research
12/21/2022

Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

Contrastive learning has been successfully used for retrieval of semanti...
research
03/21/2023

Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning

Contrastive vision-language models (e.g. CLIP) are typically created by ...
research
09/28/2022

Multilingual Search with Subword TF-IDF

Multilingual search can be achieved with subword tokenization. The accur...
research
05/17/2022

OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval

Aligning parallel sentences in multilingual corpora is essential to cura...

Please sign up or login with your details

Forgot password? Click here to reset