Multilingual ColBERT-X

09/03/2022
by   Dawn Lawrie, et al.
6

ColBERT-X is a dense retrieval model for Cross Language Information Retrieval (CLIR). In CLIR, documents are written in one natural language, while the queries are expressed in another. A related task is multilingual IR (MLIR) where the system creates a single ranked list of documents written in many languages. Given that ColBERT-X relies on a pretrained multilingual neural language model to rank documents, a multilingual training procedure can enable a version of ColBERT-X well-suited for MLIR. This paper describes that training procedure. An important factor for good MLIR ranking is fine-tuning XLM-R using mixed-language batches, where the same query is matched with documents in different languages in the same batch. Neural machine translations of MS MARCO passages are used to fine-tune the model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

Parameter-efficient Zero-shot Transfer for Cross-Language Dense Retrieval with Adapters

A popular approach to creating a zero-shot cross-language retrieval mode...
research
11/10/2021

Cross-language Information Retrieval

Two key assumptions shape the usual view of ranked retrieval: (1) that t...
research
04/29/2023

Synthetic Cross-language Information Retrieval Training Data

A key stumbling block for neural cross-language information retrieval (C...
research
02/14/2023

Enhancing Model Performance in Multilingual Information Retrieval with Comprehensive Data Engineering Techniques

In this paper, we present our solution to the Multilingual Information R...
research
12/27/2021

Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement

Cross-Lingual Information Retrieval (CLIR) aims to rank the documents wr...
research
03/26/2019

A New Approach for Semi-automatic Building and Extending a Multilingual Terminology Thesaurus

This paper describes a new system for semi-automatically building, exten...
research
04/25/2022

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Pretrained language models have improved effectiveness on numerous tasks...

Please sign up or login with your details

Forgot password? Click here to reset