Complementing Lexical Retrieval with Semantic Residual Embedding

04/29/2020
by   Luyu Gao, et al.
0

Information retrieval traditionally has relied on lexical matching signals, but lexical matching cannot handle vocabulary mismatch or topic-level matching. Neural embedding based retrieval models can match queries and documents in a latent semantic space, but they lose token-level matching information that is critical to IR. This paper presents CLEAR, a deep retrieval model that seeks to complement lexical retrieval with semantic embedding retrieval. Importantly, CLEAR uses a residual-based embedding learning framework, which focuses the embedding on the deep language structures and semantics that the lexical retrieval fails to capture. Empirical evaluation demonstrates the advantages of CLEAR over classic bag-of-words retrieval models, recent BERT-enhanced lexical retrieval models, as well as a BERT-based embedding retrieval. A full-collection retrieval with CLEAR can be as effective as a BERT-based reranking system, substantially narrowing the gap between full-collection retrieval and cost-prohibitive reranking systems

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

Classical information retrieval systems such as BM25 rely on exact lexic...
research
10/02/2020

Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Search engines often follow a two-phase paradigm where in the first stag...
research
10/11/2022

Better Than Whitespace: Information Retrieval for Languages without Custom Tokenizers

Tokenization is a crucial step in information retrieval, especially for ...
research
12/10/2021

Match Your Words! A Study of Lexical Matching in Neural Information Retrieval

Neural Information Retrieval models hold the promise to replace lexical ...
research
05/08/2019

On the Feasibility of Automated Detection of Allusive Text Reuse

The detection of allusive text reuse is particularly challenging due to ...
research
09/22/2020

Embedding-based Zero-shot Retrieval through Query Generation

Passage retrieval addresses the problem of locating relevant passages, u...
research
06/29/2023

Exploring the Representation Power of SPLADE Models

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective ap...

Please sign up or login with your details

Forgot password? Click here to reset