Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only

05/02/2018
by   Robert Litschko, et al.
0

We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all. The framework leverages shared cross-lingual word embedding spaces in which terms, queries, and documents can be represented, irrespective of their actual language. The shared embedding spaces are induced solely on the basis of monolingual corpora in two languages through an iterative process based on adversarial neural networks. Our experiments on the standard CLEF CLIR collections for three language pairs of varying degrees of language similarity (English-Dutch/Italian/Finnish) demonstrate the usefulness of the proposed fully unsupervised approach. Our CLIR models with unsupervised cross-lingual embeddings outperform baselines that utilize cross-lingual embeddings induced relying on word-level and document-level alignments. We then demonstrate that further improvements can be achieved by unsupervised ensemble CLIR models. We believe that the proposed framework is the first step towards development of effective CLIR models for language pairs and domains where parallel data are scarce or non-existent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2020

A Study of Neural Matching Models for Cross-lingual IR

In this study, we investigate interaction-based neural matching models f...
research
10/11/2017

Word Translation Without Parallel Data

State-of-the-art methods for learning cross-lingual word embeddings have...
research
09/04/2019

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?

Recent efforts in cross-lingual word embedding (CLWE) learning have pred...
research
04/11/2019

Strong Baselines for Complex Word Identification across Multiple Languages

Complex Word Identification (CWI) is the task of identifying which words...
research
01/30/2020

Lost in Embedding Space: Explaining Cross-Lingual Task Performance with Eigenvalue Divergence

Performance in cross-lingual NLP tasks is impacted by the (dis)similarit...
research
07/29/2021

The Cross-Lingual Arabic Information REtrieval (CLAIRE) System

Despite advances in neural machine translation, cross-lingual retrieval ...
research
05/11/2021

Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora

Cross-lingual text representations have gained popularity lately and act...

Please sign up or login with your details

Forgot password? Click here to reset