Learning Cross-Lingual IR from an English Retriever

12/15/2021
by   Yulong Li, et al.
0

We present a new cross-lingual information retrieval (CLIR) model trained using multi-stage knowledge distillation (KD). The teacher and the student are heterogeneous systems-the former is a pipeline that relies on machine translation and monolingual IR, while the latter executes a single CLIR operation. We show that the student can learn both multilingual representations and CLIR by optimizing two corresponding KD objectives. Learning multilingual representations from an English-only retriever is accomplished using a novel cross-lingual alignment algorithm that greedily re-positions the teacher tokens for alignment. Evaluation on the XOR-TyDi benchmark shows that the proposed model is far more effective than the existing approach of fine-tuning with cross-lingual labeled IR data, with a gain in accuracy of 25.4 Recall@5kt.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2022

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Multilingual text-video retrieval methods have improved significantly in...
research
12/07/2021

Improving Neural Cross-Lingual Summarization via Employing Optimal Transport Distance for Knowledge Distillation

Current state-of-the-art cross-lingual summarization models employ multi...
research
05/30/2023

Research on Multilingual News Clustering Based on Cross-Language Word Embeddings

Classifying the same event reported by different countries is of signifi...
research
04/03/2023

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

The advent of multilingual language models has generated a resurgence of...
research
04/12/2018

Learning Multilingual Embeddings for Cross-Lingual Information Retrieval in the Presence of Topically Aligned Corpora

Cross-lingual information retrieval is a challenging task in the absence...
research
10/10/2019

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework

Learning multilingual representations of text has proven a successful me...
research
02/26/2023

Cross-lingual Knowledge Transfer via Distillation for Multilingual Information Retrieval

In this paper, we introduce the approach behind our submission for the M...

Please sign up or login with your details

Forgot password? Click here to reset