LAReQA: Language-agnostic answer retrieval from a multilingual pool

04/11/2020
by   Uma Roy, et al.
0

We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strategies for achieving strong alignment. We find that augmenting training data via machine translation is effective, and improves significantly over using mBERT out-of-the-box. Interestingly, the embedding baseline that performs the best on LAReQA falls short of competing baselines on zero-shot variants of our task that only target "weak" alignment. This finding underscores our claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation.

READ FULL TEXT
research
06/16/2020

Cross-lingual Retrieval for Iterative Self-Supervised Training

Recent studies have demonstrated the cross-lingual alignment ability of ...
research
09/10/2021

A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

Language agnostic and semantic-language information isolation is an emer...
research
04/18/2021

Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders

Previous works mainly focus on improving cross-lingual transfer for NLU ...
research
04/29/2020

End-to-End Slot Alignment and Recognition for Cross-Lingual NLU

Natural language understanding in the context of goal oriented dialog sy...
research
05/17/2022

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

We propose the SAMU-XLSR: Semantically-Aligned Multimodal Utterance-leve...
research
08/10/2023

Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning

The task of retrieving already debunked narratives aims to detect storie...
research
05/17/2022

OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval

Aligning parallel sentences in multilingual corpora is essential to cura...

Please sign up or login with your details

Forgot password? Click here to reset