Dealing with Typos for BERT-based Passage Retrieval and Ranking

08/27/2021
by   Shengyao Zhuang, et al.
0

Passage retrieval and ranking is a key task in open-domain question answering and information retrieval. Current effective approaches mostly rely on pre-trained deep language model-based retrievers and rankers. These methods have been shown to effectively model the semantic matching between queries and passages, also in presence of keyword mismatch, i.e. passages that are relevant to a query but do not contain important query keywords. In this paper we consider the Dense Retriever (DR), a passage retrieval method, and the BERT re-ranker, a popular passage re-ranking method. In this context, we formally investigate how these models respond and adapt to a specific type of keyword mismatch – that caused by keyword typos occurring in queries. Through empirical investigation, we find that typos can lead to a significant drop in retrieval and ranking effectiveness. We then propose a simple typos-aware training framework for DR and BERT re-ranker to address this issue. Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

Current dense retrievers are not robust to out-of-domain and outlier que...
research
09/13/2022

HEARTS: Multi-task Fusion of Dense Retrieval and Non-autoregressive Generation for Sponsored Search

Matching user search queries with relevant keywords bid by advertisers i...
research
01/15/2023

Improving Noise Robustness for Spoken Content Retrieval using Semi-supervised ASR and N-best Transcripts for BERT-based Ranking Models

BERT-based re-ranking and dense retrieval (DR) systems have been shown t...
research
09/01/2022

Isotropic Representation Can Improve Dense Retrieval

The recent advancement in language representation modeling has broadly a...
research
03/11/2020

Keyword-Attentive Deep Semantic Matching

Deep Semantic Matching is a crucial component in various natural languag...
research
08/11/2022

Disentangled Modeling of Domain and Relevance for Adaptable Dense Retrieval

Recent advance in Dense Retrieval (DR) techniques has significantly impr...
research
04/25/2022

Evaluating Extrapolation Performance of Dense Retrieval

A retrieval model should not only interpolate the training data but also...

Please sign up or login with your details

Forgot password? Click here to reset