Learning Multilingual Embeddings for Cross-Lingual Information Retrieval in the Presence of Topically Aligned Corpora

04/12/2018
by   Mitodru Niyogi, et al.
0

Cross-lingual information retrieval is a challenging task in the absence of aligned parallel corpora. In this paper, we address this problem by considering topically aligned corpora designed for evaluating an IR setup. To emphasize, we neither use any sentence-aligned corpora or document-aligned corpora, nor do we use any language specific resources such as dictionary, thesaurus, or grammar rules. Instead, we use an embedding into a common space and learn word correspondences directly from there. We test our proposed approach for bilingual IR on standard FIRE datasets for Bangla, Hindi and English. The proposed method is superior to the state-of-the-art method not only for IR evaluation measures but also in terms of time requirements. We extend our method successfully to the trilingual setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2022

Language Agnostic Multilingual Information Retrieval with Contrastive Learning

Multilingual information retrieval is challenging due to the lack of tra...
research
09/19/2010

Pair-Wise Cluster Analysis

This paper studies the problem of learning clusters which are consistent...
research
12/15/2021

Learning Cross-Lingual IR from an English Retriever

We present a new cross-lingual information retrieval (CLIR) model traine...
research
01/11/2016

Trans-gram, Fast Cross-lingual Word-embeddings

We introduce Trans-gram, a simple and computationally-efficient method t...
research
07/29/2021

The Cross-Lingual Arabic Information REtrieval (CLAIRE) System

Despite advances in neural machine translation, cross-lingual retrieval ...
research
09/30/2019

Simple and Effective Paraphrastic Similarity from Parallel Translations

We present a model and methodology for learning paraphrastic sentence em...

Please sign up or login with your details

Forgot password? Click here to reset