DeepAI
Log In Sign Up

Multi-View Document Representation Learning for Open-Domain Dense Retrieval

03/16/2022
by   Shunyu Zhang, et al.
0

Dense retrieval has achieved impressive advances in first-stage retrieval from a large-scale document collection, which is built on bi-encoder architecture to produce single vector representation of query and document. However, a document can usually answer multiple potential queries from different views. So the single vector representation of a document is hard to match with multi-view queries, and faces a semantic mismatch problem. This paper proposes a multi-view document representation learning framework, aiming to produce multi-view embeddings to represent documents and enforce them to align with different queries. First, we propose a simple yet effective method of generating multiple embeddings through viewers. Second, to prevent multi-view embeddings from collapsing to the same one, we further propose a global-local loss with annealed temperature to encourage the multiple viewers to better align with different potential queries. Experiments show our method outperforms recent works and achieves state-of-the-art results.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/28/2020

RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

Although exact term match between queries and documents is the dominant ...
05/23/2022

UnifieR: A Unified Retriever for Large-Scale Retrieval

Large-scale retrieval is to recall relevant documents from a huge collec...
07/08/2017

Efficient Vector Representation for Documents through Corruption

We present an efficient document representation learning framework, Docu...
08/11/2022

On the Value of Behavioral Representations for Dense Retrieval

We consider text retrieval within dense representational space in real-w...
08/11/2016

Multi-View Product Image Search Using Deep ConvNets Representations

Multi-view product image queries can improve retrieval performance over ...
08/29/2022

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Retrieval models based on dense representations in semantic space have b...
09/06/2018

Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis

Multi-omic data provides multiple views of the same patients. Integrativ...