Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval

05/06/2023
by   Shengyao Zhuang, et al.
0

Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language models (PLMs) need to be trained to encompass both the relevance matching task and the cross-language alignment task. However, cross-lingual data for training is often scarcely available. In this paper, rather than using more cross-lingual data for training, we propose to use cross-lingual query generation to augment passage representations with queries in languages other than the original passage language. These augmented representations are used at inference time so that the representation can encode more information across the different target languages. Training of a cross-lingual query generator does not require additional training data to that used for the dense retriever. The query generator training is also effective because the pre-training task for the generator (T5 text-to-text training) is very similar to the fine-tuning task (generation of a query). The use of the generator does not increase query latency at inference and can be combined with any cross-lingual dense retrieval method. Results from experiments on a benchmark cross-lingual information retrieval dataset show that our approach can improve the effectiveness of existing cross-lingual dense retrieval methods. Implementation of our methods, along with all generated query files are made publicly available at https://github.com/ielab/xQG4xDR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

We present Unicoder, a universal language encoder that is insensitive to...
research
03/27/2023

Empowering Dual-Encoder with Query Generator for Cross-Lingual Dense Retrieval

In monolingual dense retrieval, lots of works focus on how to distill kn...
research
03/28/2023

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

This paper reports on a study of cross-lingual information retrieval (CL...
research
04/03/2023

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

The advent of multilingual language models has generated a resurgence of...
research
06/03/2021

A Dataset and Baselines for Multilingual Reply Suggestion

Reply suggestion models help users process emails and chats faster. Prev...
research
06/21/2022

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

The Differentiable Search Index (DSI) is a new, emerging paradigm for in...
research
10/13/2020

Modeling the Music Genre Perception across Language-Bound Cultures

The music genre perception expressed through human annotations of artist...

Please sign up or login with your details

Forgot password? Click here to reset