Exploring the Viability of Synthetic Query Generation for Relevance Prediction

05/19/2023
by   Aditi Chaudhary, et al.
0

Query-document relevance prediction is a critical problem in Information Retrieval systems. This problem has increasingly been tackled using (pretrained) transformer-based models which are finetuned using large collections of labeled data. However, in specialized domains such as e-commerce and healthcare, the viability of this approach is limited by the dearth of large in-domain data. To address this paucity, recent methods leverage these powerful models to generate high-quality task and domain-specific synthetic data. Prior work has largely explored synthetic data generation or query generation (QGen) for Question-Answering (QA) and binary (yes/no) relevance prediction, where for instance, the QGen models are given a document, and trained to generate a query relevant to that document. However in many problems, we have a more fine-grained notion of relevance than a simple yes/no label. Thus, in this work, we conduct a detailed study into how QGen approaches can be leveraged for nuanced relevance prediction. We demonstrate that – contrary to claims from prior works – current QGen approaches fall short of the more conventional cross-domain transfer-learning approaches. Via empirical studies spanning 3 public e-commerce benchmarks, we identify new shortcomings of existing QGen approaches – including their inability to distinguish between different grades of relevance. To address this, we introduce label-conditioned QGen models which incorporates knowledge about the different relevance. While our experiments demonstrate that these modifications help improve performance of QGen techniques, we also find that QGen approaches struggle to capture the full nuance of the relevance label space and as a result the generated queries are not faithful to the desired relevance label.

READ FULL TEXT
research
11/15/2022

Generative Long-form Question Answering: Relevance, Faithfulness and Succinctness

In this thesis, we investigated the relevance, faithfulness, and succinc...
research
06/16/2023

GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval

Recent studies show that Generative Relevance Feedback (GRF), using text...
research
12/13/2022

Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

Although neural information retrieval has witnessed great improvements, ...
research
12/21/2020

Cross-domain Retrieval in the Legal and Patent Domains: a Reproducibility Study

Domain specific search has always been a challenging information retriev...
research
04/07/2021

Distantly Supervised Transformers For E-Commerce Product QA

We propose a practical instant question answering (QA) system on product...
research
03/30/2021

An In-depth Analysis of Passage-Level Label Transfer for Contextual Document Ranking

Recently introduced pre-trained contextualized autoregressive models lik...
research
04/23/2023

A Lightweight Constrained Generation Alternative for Query-focused Summarization

Query-focused summarization (QFS) aims to provide a summary of a documen...

Please sign up or login with your details

Forgot password? Click here to reset