Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings

05/04/2022
by   Georgios Sidiropoulos, et al.
0

Dense retrieval is becoming one of the standard approaches for document and passage ranking. The dual-encoder architecture is widely adopted for scoring question-passage pairs due to its efficiency and high performance. Typically, dense retrieval models are evaluated on clean and curated datasets. However, when deployed in real-life applications, these models encounter noisy user-generated text. That said, the performance of state-of-the-art dense retrievers can substantially deteriorate when exposed to noisy text. In this work, we study the robustness of dense retrievers against typos in the user question. We observe a significant drop in the performance of the dual-encoder model when encountering typos and explore ways to improve its robustness by combining data augmentation with contrastive learning. Our experiments on two large-scale passage ranking and open-domain question answering datasets show that our proposed approach outperforms competing approaches. Additionally, we perform a thorough analysis on robustness. Finally, we provide insights on how different typos affect the robustness of embeddings differently and how our method alleviates the effect of some typos but not of others.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2020

RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering

In open-domain question answering, dense passage retrieval has become a ...
research
04/06/2023

Noise-Robust Dense Retrieval via Contrastive Alignment Post Training

The success of contextual word representations and advances in neural in...
research
10/11/2022

Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering

Given its effectiveness on knowledge-intensive natural language processi...
research
06/05/2023

SamToNe: Improving Contrastive Loss for Dual Encoder Retrieval Models with Same Tower Negatives

Dual encoders have been used for retrieval tasks and representation lear...
research
12/17/2022

Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable Augmentation with Query Extraction and Generation

Dense retrievers have made significant strides in obtaining state-of-the...
research
04/06/2023

Revisiting Dense Retrieval with Unanswerable Counterfactuals

The retriever-reader framework is popular for open-domain question answe...
research
08/01/2023

On the Effects of Regional Spelling Conventions in Retrieval Models

One advantage of neural ranking models is that they are meant to general...

Please sign up or login with your details

Forgot password? Click here to reset