Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval

05/22/2023
by   Zehan Li, et al.
0

Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising. The conventional MAE framework relies on leveraging the passage reconstruction of decoder to bolster the text representation ability of encoder, thereby enhancing the performance of resulting dense retrieval systems. Within the context of building the representation ability of the encoder through passage reconstruction of decoder, it is reasonable to postulate that a “more demanding” decoder will necessitate a corresponding increase in the encoder's ability. To this end, we propose a novel token importance aware masking strategy based on pointwise mutual information to intensify the challenge of the decoder. Importantly, our approach can be implemented in an unsupervised manner, without adding additional expenses to the pre-training phase. Our experiments verify that the proposed method is both effective and robust on large-scale supervised passage retrieval datasets and out-of-domain zero-shot retrieval benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2022

RetroMAE: Pre-training Retrieval-oriented Transformers via Masked Auto-Encoder

Pre-trained models have demonstrated superior power on many important ta...
research
08/16/2022

ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Dense passage retrieval aims to retrieve the relevant passages of a quer...
research
07/28/2021

Domain-matched Pre-training Tasks for Dense Retrieval

Pre-training on larger datasets with ever increasing model size is now a...
research
08/21/2022

A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval

Dense retrieval (DR) has shown promising results in information retrieva...
research
04/22/2022

Pre-train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span Prediction

Dense retrieval has shown promising results in many information retrieva...
research
08/31/2022

LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

In large-scale retrieval, the lexicon-weighting paradigm, learning weigh...
research
04/20/2023

CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with Mixture-of-Textual-Experts for Passage Retrieval

Passage retrieval aims to retrieve relevant passages from large collecti...

Please sign up or login with your details

Forgot password? Click here to reset