SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

09/21/2021
by   Thibault Formal, et al.
0

In neural Information Retrieval (IR), ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. Introduced recently, the SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches. In this paper, we build on SPLADE and propose several significant improvements in terms of effectiveness and/or efficiency. More specifically, we modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation. We also report results on the BEIR benchmark. Overall, SPLADE is considerably improved with more than 9% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

In neural Information Retrieval, ongoing research is directed towards im...
research
05/10/2022

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective

Neural retrievers based on dense representations combined with Approxima...
research
05/08/2021

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Recently, the retrieval models based on dense representations have been ...
research
06/28/2021

A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques

Recent developments in representational learning for information retriev...
research
04/15/2021

UHD-BERT: Bucketed Ultra-High Dimensional Sparse Representations for Full Ranking

Neural information retrieval (IR) models are promising mainly because th...
research
08/09/2022

Early Stage Sparse Retrieval with Entity Linking

Despite the advantages of their low-resource settings, traditional spars...
research
04/15/2021

COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

Classical information retrieval systems such as BM25 rely on exact lexic...

Please sign up or login with your details

Forgot password? Click here to reset