DeepAI AI Chat
Log In Sign Up

UnifieR: A Unified Retriever for Large-Scale Retrieval

by   Tao Shen, et al.

Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms unveil the PLMs' representation capability in different granularities, i.e., global sequence-level compression and local word-level contexts, respectively. Inspired by their complementary global-local contextualization and distinct representing views, we propose a new learning framework, UnifieR, which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability. Experiments on passage retrieval benchmarks verify its effectiveness in both paradigms. A uni-retrieval scheme is further presented with even better retrieval quality. We lastly evaluate the model on BEIR benchmark to verify its transferability.


page 1

page 2

page 3

page 4


Multi-View Document Representation Learning for Open-Domain Dense Retrieval

Dense retrieval has achieved impressive advances in first-stage retrieva...

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Retrieval models based on dense representations in semantic space have b...

ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Dense passage retrieval aims to retrieve the relevant passages of a quer...

Sparse, Dense, and Attentional Representations for Text Retrieval

Dual encoder architectures perform retrieval by encoding documents and q...

LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

In large-scale retrieval, the lexicon-weighting paradigm, learning weigh...

RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

To better support retrieval applications such as web search and question...

Semantic Search in Millions of Equations

Given the increase of publications, search for relevant papers becomes t...