Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

08/22/2023
by   Xiaojie Sun, et al.
0

Grounded on pre-trained language models (PLMs), dense retrieval has been studied extensively on plain text. In contrast, there has been little research on retrieving data with multiple aspects using dense models. In the scenarios such as product search, the aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of leveraging aspect information for multi-aspect retrieval is to introduce an auxiliary classification objective, i.e., using item contents to predict the annotated value IDs of item aspects. However, by learning the value embeddings from scratch, this approach may not capture the various semantic similarities between the values sufficiently. To address this limitation, we leverage the aspect information as text strings rather than class IDs during pre-training so that their semantic similarities can be naturally captured in the PLMs. To facilitate effective retrieval with the aspect strings, we propose mutual prediction objectives between the text of the item aspect and content. In this way, our model makes more sufficient use of aspect information than conducting undifferentiated masked language modeling (MLM) on the concatenated text of aspects and content. Extensive experiments on two real-world datasets (product and mini-program search) show that our approach can outperform competitive baselines both treating aspect values as classes and conducting the same MLM for aspect and content strings. Code and related dataset will be available at the URL [https://github.com/sunxiaojie99/ATTEMPT].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval

Pre-trained language model (PTM) has been shown to yield powerful text r...
research
03/16/2021

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

Multimodal pre-training has propelled great advancement in vision-and-la...
research
12/02/2021

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Recent progress has shown that large-scale pre-training using contrastiv...
research
12/15/2022

MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are Better Dense Retrievers

Dense retrieval aims to map queries and passages into low-dimensional ve...
research
07/31/2022

Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Pre-trained transformers has declared its success in many NLP tasks. One...
research
09/25/2020

RecoBERT: A Catalog Language Model for Text-Based Recommendations

Language models that utilize extensive self-supervised pre-training from...
research
03/21/2018

Multiple Models for Recommending Temporal Aspects of Entities

Entity aspect recommendation is an emerging task in semantic search that...

Please sign up or login with your details

Forgot password? Click here to reset