SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

09/13/2022
by   Eunseong Choi, et al.
0

Sparse document representations have been widely used to retrieve relevant documents via exact lexical matching. Owing to the pre-computed inverted index, it supports fast ad-hoc search but incurs the vocabulary mismatch problem. Although recent neural ranking models using pre-trained language models can address this problem, they usually require expensive query inference costs, implying the trade-off between effectiveness and efficiency. Tackling the trade-off, we propose a novel uni-encoder ranking model, Sparse retriever using a Dual document Encoder (SpaDE), learning document representation via the dual encoder. Each encoder plays a central role in (i) adjusting the importance of terms to improve lexical matching and (ii) expanding additional terms to support semantic matching. Furthermore, our co-training strategy trains the dual encoder effectively and avoids unnecessary intervention in training each other. Experimental results on several benchmarks show that SpaDE outperforms existing uni-encoder ranking models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

Composite Re-Ranking for Efficient Document Search with BERT

Although considerable efforts have been devoted to transformer-based ran...
research
10/20/2020

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

Ranking has always been one of the top concerns in information retrieval...
research
06/29/2023

Exploring the Representation Power of SPLADE Models

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective ap...
research
05/23/2023

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

Neural document rerankers are extremely effective in terms of accuracy. ...
research
04/15/2021

COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

Classical information retrieval systems such as BM25 rely on exact lexic...
research
04/23/2022

Dual Skipping Guidance for Document Retrieval with Learned Sparse Representations

This paper proposes a dual skipping guidance scheme with hybrid scoring ...
research
10/12/2021

Fast Forward Indexes for Efficient Document Ranking

Neural approaches, specifically transformer models, for ranking document...

Please sign up or login with your details

Forgot password? Click here to reset