Can we trust the evaluation on ChatGPT?

03/22/2023
by   Rachith Aiyappa, et al.
4

ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated remarkable performance in numerous natural language tasks. Despite its evident usefulness, evaluating ChatGPT's performance in diverse problem domains remains challenging due to the closed nature of the model and its continuous updates via Reinforcement Learning from Human Feedback (RLHF). We highlight the issue of data contamination in ChatGPT evaluations, with a case study of the task of stance detection. We discuss the challenge of preventing data contamination and ensuring fair model evaluation in the age of closed and continuously trained models.

READ FULL TEXT
research
05/21/2023

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

Large language models have demonstrated remarkable performance across va...
research
07/19/2020

An Overview of Natural Language State Representation for Reinforcement Learning

A suitable state representation is a fundamental part of the learning pr...
research
07/24/2023

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

We propose Reinforcement Learning from Contrast Distillation (RLCD), a m...
research
09/19/2023

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback

To solve complex tasks, large language models (LLMs) often require multi...
research
04/12/2021

Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

Human ratings are one of the most prevalent methods to evaluate the perf...
research
09/13/2023

Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning

The recent advances in the development of Large Language Models (LLMs) l...
research
10/14/2021

Practical Benefits of Feature Feedback Under Distribution Shift

In attempts to develop sample-efficient algorithms, researcher have expl...

Please sign up or login with your details

Forgot password? Click here to reset