Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

04/19/2023
by   Weiwei Sun, et al.
0

Large Language Models (LLMs) have demonstrated a remarkable ability to generalize zero-shot to various language-related tasks. This paper focuses on the study of exploring generative LLMs such as ChatGPT and GPT-4 for relevance ranking in Information Retrieval (IR). Surprisingly, our experiments reveal that properly instructed ChatGPT and GPT-4 can deliver competitive, even superior results than supervised methods on popular IR benchmarks. Notably, GPT-4 outperforms the fully fine-tuned monoT5-3B on MS MARCO by an average of 2.7 nDCG on TREC datasets, an average of 2.3 nDCG on eight BEIR datasets, and an average of 2.7 nDCG on ten low-resource languages Mr.TyDi. Subsequently, we delve into the potential for distilling the ranking capabilities of ChatGPT into a specialized model. Our small specialized model that trained on 10K ChatGPT generated data outperforms monoT5 trained on 400K annotated MS MARCO data on BEIR. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT

READ FULL TEXT

page 11

page 12

research
08/31/2021

mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset

The MS MARCO ranking dataset has been widely used for training deep lear...
research
08/29/2023

Improving Neural Ranking Models with Traditional IR Methods

Neural ranking methods based on large transformer models have recently g...
research
05/12/2023

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Information retrieval (IR) plays a crucial role in locating relevant res...
research
09/07/2023

Evaluating ChatGPT as a Recommender System: A Rigorous Approach

Recent popularity surrounds large AI language models due to their impres...
research
08/21/2023

Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis

The rapid expansion of the digital world has propelled sentiment analysi...
research
09/26/2022

Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

Previous work has shown that there exists a scaling law between the size...
research
05/27/2023

Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

Retrieval augmentation can aid language models (LMs) in knowledge-intens...

Please sign up or login with your details

Forgot password? Click here to reset