Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations

06/08/2019
by   Rui Zhang, et al.
0

In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations. We match queries and documents in both source and target languages with four components, each of which is implemented as a term interaction-based deep neural network with cross-lingual word embeddings as input. By including query likelihood scores as extra features, our model effectively learns to rerank the retrieved documents by using a small number of relevance labels for low-resource language pairs. Due to the shared cross-lingual word embedding space, the model can also be directly applied to another language pair without any training label. Experimental results on the MATERIAL dataset show that our model outperforms the competitive translation-based baselines on English-Swahili, English-Tagalog, and English-Somali cross-lingual information retrieval tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation

Benefiting from transformer-based pre-trained language models, neural ra...
research
12/22/2018

Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification

Text classification must sometimes be applied in situations with no trai...
research
07/29/2021

The Cross-Lingual Arabic Information REtrieval (CLAIRE) System

Despite advances in neural machine translation, cross-lingual retrieval ...
research
11/02/2020

Cross-Lingual Document Retrieval with Smooth Learning

Cross-lingual document search is an information retrieval task in which ...
research
12/27/2021

Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement

Cross-Lingual Information Retrieval (CLIR) aims to rank the documents wr...
research
05/17/2020

Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce

With the prosperous of cross-border e-commerce, there is an urgent deman...
research
05/21/2018

Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction

Cross-lingual information extraction (CLIE) is an important and challeng...

Please sign up or login with your details

Forgot password? Click here to reset