ConTextual Mask Auto-Encoder for Dense Passage Retrieval

08/16/2022
by   Xing Wu, et al.
0

Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i.e., vectors) of the query and the passages. Recent studies have explored improving pre-trained language models to boost dense retrieval performance. This paper proposes CoT-MAE (ConTextual Masked Auto-Encoder), a simple yet effective generative pre-training method for dense passage retrieval. CoT-MAE employs an asymmetric encoder-decoder architecture that learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. Precisely, self-supervised masked auto-encoding learns to model the semantics of the tokens inside a text span, and context-supervised masked auto-encoding learns to model the semantical correlation between the text spans. We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines, demonstrating the high efficiency of CoT-MAE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval

Recently, various studies have been directed towards exploring dense pas...
research
06/07/2023

ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems

Dialogue response selection aims to select an appropriate response from ...
research
12/19/2022

Query-as-context Pre-training for Dense Passage Retrieval

This paper presents a pre-training technique called query-as-context tha...
research
04/20/2023

CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with Mixture-of-Textual-Experts for Passage Retrieval

Passage retrieval aims to retrieve relevant passages from large collecti...
research
11/16/2022

RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

To better support retrieval applications such as web search and question...
research
05/23/2022

UnifieR: A Unified Retriever for Large-Scale Retrieval

Large-scale retrieval is to recall relevant documents from a huge collec...
research
04/22/2022

Pre-train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span Prediction

Dense retrieval has shown promising results in many information retrieva...

Please sign up or login with your details

Forgot password? Click here to reset