More Robust Dense Retrieval with Contrastive Dual Learning

07/16/2021
by   Yizhi Li, et al.
0

Dense retrieval conducts text retrieval in the embedding space and has shown many advantages compared to sparse retrieval. Existing dense retrievers optimize representations of queries and documents with contrastive training and map them to the embedding space. The embedding space is optimized by aligning the matched query-document pairs and pushing the negative documents away from the query. However, in such training paradigm, the queries are only optimized to align to the documents and are coarsely positioned, leading to an anisotropic query embedding space. In this paper, we analyze the embedding space distributions and propose an effective training paradigm, Contrastive Dual Learning for Approximate Nearest Neighbor (DANCE) to learn fine-grained query representations for dense retrieval. DANCE incorporates an additional dual training object of query retrieval, inspired by the classic information retrieval training axiom, query likelihood. With contrastive learning, the dual training object of DANCE learns more tailored representations for queries and documents to keep the embedding space smooth and uniform, thriving on the ranking performance of DANCE on the MS MARCO document retrieval task. Different from ANCE that only optimized with the document retrieval task, DANCE concentrates the query embeddings closer to document representations while making the document distribution more discriminative. Such concentrated query embedding distribution assigns more uniform negative sampling probabilities to queries and helps to sufficiently optimize query representations in the query retrieval task. Our codes are released at https://github.com/thunlp/DANCE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2023

Noise-Robust Dense Retrieval via Contrastive Alignment Post Training

The success of contextual word representations and advances in neural in...
research
05/10/2023

Unsupervised Dense Retrieval Training with Web Anchors

In this work, we present an unsupervised retrieval method with contrasti...
research
04/27/2023

Multivariate Representation Learning for Information Retrieval

Dense retrieval models use bi-encoder network architectures for learning...
research
10/14/2021

Exposing Query Identification for Search Transparency

Search systems control the exposure of ranked content to searchers. In m...
research
05/06/2022

Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder

Dense retrievers encode texts and map them in an embedding space using p...
research
10/05/2022

Contextualized Generative Retrieval

The text retrieval task is mainly performed in two ways: the bi-encoder ...
research
10/07/2021

Adversarial Retriever-Ranker for dense text retrieval

Current dense text retrieval models face two typical challenges. First, ...

Please sign up or login with your details

Forgot password? Click here to reset