CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with Mixture-of-Textual-Experts for Passage Retrieval

04/20/2023
by   Guangyuan Ma, et al.
0

Passage retrieval aims to retrieve relevant passages from large collections of the open-domain corpus. Contextual Masked Auto-Encoding has been proven effective in representation bottleneck pre-training of a monolithic dual-encoder for passage retrieval. Siamese or fully separated dual-encoders are often adopted as basic retrieval architecture in the pre-training and fine-tuning stages for encoding queries and passages into their latent embedding spaces. However, simply sharing or separating the parameters of the dual-encoder results in an imbalanced discrimination of the embedding spaces. In this work, we propose to pre-train Contextual Masked Auto-Encoder with Mixture-of-Textual-Experts (CoT-MoTE). Specifically, we incorporate textual-specific experts for individually encoding the distinct properties of queries and passages. Meanwhile, a shared self-attention layer is still kept for unified attention modeling. Results on large-scale passage retrieval benchmarks show steady improvement in retrieval performances. The quantitive analysis also shows a more balanced discrimination of the latent embedding spaces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2023

CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval

Growing techniques have been emerging to improve the performance of pass...
research
08/16/2022

ConTextual Mask Auto-Encoder for Dense Passage Retrieval

Dense passage retrieval aims to retrieve the relevant passages of a quer...
research
11/03/2021

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

We present a unified Vision-Language pretrained Model (VLMo) that jointl...
research
05/22/2023

Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval

Recently, various studies have been directed towards exploring dense pas...
research
05/04/2023

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

To better support information retrieval tasks such as web search and ope...
research
12/15/2021

Large Dual Encoders Are Generalizable Retrievers

It has been shown that dual encoders trained on one domain often fail to...
research
11/16/2022

RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

To better support retrieval applications such as web search and question...

Please sign up or login with your details

Forgot password? Click here to reset