Cross-lingual Knowledge Transfer via Distillation for Multilingual Information Retrieval

02/26/2023
by   Zhiqi Huang, et al.
0

In this paper, we introduce the approach behind our submission for the MIRACL challenge, a WSDM 2023 Cup competition that centers on ad-hoc retrieval across 18 diverse languages. Our solution contains two neural-based models. The first model is a bi-encoder re-ranker, on which we apply a cross-lingual distillation technique to transfer ranking knowledge from English to the target language space. The second model is a cross-encoder re-ranker trained on multilingual retrieval data generated using neural machine translation. We further fine-tune both models using MIRACL training data and ensemble multiple rank lists to obtain the final result. According to the MIRACL leaderboard, our approach ranks 8th for the Test-A set and 2nd for the Test-B set among the 16 known languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation

Benefiting from transformer-based pre-trained language models, neural ra...
research
05/15/2023

Soft Prompt Decoding for Multilingual Dense Retrieval

In this work, we explore a Multilingual Information Retrieval (MLIR) tas...
research
02/14/2023

Enhancing Model Performance in Multilingual Information Retrieval with Comprehensive Data Engineering Techniques

In this paper, we present our solution to the Multilingual Information R...
research
05/23/2023

Revisiting Machine Translation for Cross-lingual Classification

Machine Translation (MT) has been widely used for cross-lingual classifi...
research
12/15/2021

Learning Cross-Lingual IR from an English Retriever

We present a new cross-lingual information retrieval (CLIR) model traine...
research
06/01/2023

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

Research in multilingual speech-to-text translation is topical. Having a...
research
11/03/2021

Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Interactive and non-interactive model are the two de-facto standard fram...

Please sign up or login with your details

Forgot password? Click here to reset