Exploring the Representation Power of SPLADE Models

06/29/2023
by   Joel Mackenzie, et al.
0

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where documents are represented by term impact scores derived from large language models. During training, SPLADE applies regularization to ensure postings lists are kept sparse – with the aim of mimicking the properties of natural term distributions – allowing efficient and effective lexical matching and ranking. However, we hypothesize that SPLADE may encode additional signals into common postings lists to further improve effectiveness. To explore this idea, we perform a number of empirical analyses where we re-train SPLADE with different, controlled vocabularies and measure how effective it is at ranking passages. Our findings suggest that SPLADE can effectively encode useful ranking signals in documents even when the vocabulary is constrained to terms that are not traditionally useful for ranking, such as stopwords or even random words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

In neural Information Retrieval, ongoing research is directed towards im...
research
09/13/2022

SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Sparse document representations have been widely used to retrieve releva...
research
10/02/2020

Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Search engines often follow a two-phase paradigm where in the first stag...
research
04/29/2020

Complementing Lexical Retrieval with Semantic Residual Embedding

Information retrieval traditionally has relied on lexical matching signa...
research
10/13/2022

Query Expansion Using Contextual Clue Sampling with Language Models

Query expansion is an effective approach for mitigating vocabulary misma...
research
04/24/2022

Faster Learned Sparse Retrieval with Guided Traversal

Neural information retrieval architectures based on transformers such as...
research
05/21/2023

Gene Set Summarization using Large Language Models

Molecular biologists frequently interpret gene lists derived from high-t...

Please sign up or login with your details

Forgot password? Click here to reset