Unsupervised Dense Retrieval Training with Web Anchors

05/10/2023
by   Yiqing Xie, et al.
1

In this work, we present an unsupervised retrieval method with contrastive learning on web anchors. The anchor text describes the content that is referenced from the linked page. This shows similarities to search queries that aim to retrieve pertinent information from relevant documents. Based on their commonalities, we train an unsupervised dense retriever, Anchor-DR, with a contrastive learning task that matches the anchor text and the linked document. To filter out uninformative anchors (such as “homepage” or other functional anchors), we present a novel filtering technique to only select anchors that contain similar types of information as search queries. Experiments show that Anchor-DR outperforms state-of-the-art methods on unsupervised dense retrieval by a large margin (e.g., by 5.3 especially significant for search and question answering tasks. Our analysis further reveals that the pattern of anchor-document pairs is similar to that of search query-document pairs. Code available at https://github.com/Veronicium/AnchorDR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2021

More Robust Dense Retrieval with Contrastive Dual Learning

Dense retrieval conducts text retrieval in the embedding space and has s...
research
07/17/2023

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

Dense retrieval (DR) converts queries and documents into dense embedding...
research
02/25/2020

Abstractive Snippet Generation

An abstractive snippet is an originally created piece of text to summari...
research
12/17/2022

Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable Augmentation with Query Extraction and Generation

Dense retrievers have made significant strides in obtaining state-of-the...
research
01/28/2020

Selective Weak Supervision for Neural Information Retrieval

This paper democratizes neural information retrieval to scenarios where ...
research
04/10/2023

LADER: Log-Augmented DEnse Retrieval for Biomedical Literature Search

Queries with similar information needs tend to have similar document cli...
research
09/13/2022

HEARTS: Multi-task Fusion of Dense Retrieval and Non-autoregressive Generation for Sponsored Search

Matching user search queries with relevant keywords bid by advertisers i...

Please sign up or login with your details

Forgot password? Click here to reset