Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts

05/03/2023
by   Arian Askari, et al.
0

We investigate the usefulness of generative Large Language Models (LLMs) in generating training data for cross-encoder re-rankers in a novel direction: generating synthetic documents instead of synthetic queries. We introduce a new dataset, ChatGPT-RetrievalQA, and compare the effectiveness of models fine-tuned on LLM-generated and human-generated data. Data generated with generative LLMs can be used to augment training data, especially in domains with smaller amounts of labeled data. We build ChatGPT-RetrievalQA based on an existing dataset, human ChatGPT Comparison Corpus (HC3), consisting of public question collections with human responses and answers from ChatGPT. We fine-tune a range of cross-encoder re-rankers on either human-generated or ChatGPT-generated data. Our evaluation on MS MARCO DEV, TREC DL'19, and TREC DL'20 demonstrates that cross-encoder re-ranking models trained on ChatGPT responses are statistically significantly more effective zero-shot re-rankers than those trained on human responses. In a supervised setting, the human-trained re-rankers outperform the LLM-trained re-rankers. Our novel findings suggest that generative LLMs have high potential in generating training data for neural retrieval models. Further work is needed to determine the effect of factually wrong information in the generated responses and test our findings' generalizability with open-source LLMs. We release our data, code, and cross-encoders checkpoints for future work.

READ FULL TEXT
research
05/26/2023

Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial Tasks

Recently large language models (LLMs) like ChatGPT have shown impressive...
research
11/14/2021

Invariant Risk Minimisation for Cross-Organism Inference: Substituting Mouse Data for Human Data in Human Risk Factor Discovery

Human medical data can be challenging to obtain due to data privacy conc...
research
02/02/2023

Creating a Large Language Model of a Philosopher

Can large language models be trained to produce philosophical texts that...
research
05/23/2023

LLM-powered Data Augmentation for Enhanced Crosslingual Performance

This paper aims to explore the potential of leveraging Large Language Mo...
research
11/09/2020

Efficient Training Data Generation for Phase-Based DOA Estimation

Deep learning (DL) based direction of arrival (DOA) estimation is an act...
research
01/18/2023

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

The introduction of ChatGPT has garnered widespread attention in both ac...
research
07/19/2023

Are We Ready to Embrace Generative AI for Software Q A?

Stack Overflow, the world's largest software Q A (SQA) website, is fac...

Please sign up or login with your details

Forgot password? Click here to reset