DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

01/26/2023
by   Eric Mitchell, et al.
0

The fluency and factual knowledge of large language models (LLMs) heightens the need for corresponding systems to detect whether a piece of text is machine-written. For example, students may use LLMs to complete written assignments, leaving instructors unable to accurately assess student learning. In this paper, we first demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g, T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.

READ FULL TEXT
research
10/11/2020

Connecting the Dots Between Fact Verification and Fake News Detection

Fact verification models have enjoyed a fast advancement in the last two...
research
05/17/2023

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

With the advent of fluent generative language models that can produce co...
research
05/27/2023

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Large language models (LLMs) have notably enhanced the fluency and diver...
research
04/28/2021

Zero-Shot Detection via Vision and Language Knowledge Distillation

Zero-shot image classification has made promising progress by training t...
research
07/24/2020

IDS at SemEval-2020 Task 10: Does Pre-trained Language Model Know What to Emphasize?

We propose a novel method that enables us to determine words that deserv...
research
07/05/2023

Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues

Research suggests that providing specific and timely feedback to human t...
research
06/08/2023

Can AI Moderate Online Communities?

The task of cultivating healthy communication in online communities beco...

Please sign up or login with your details

Forgot password? Click here to reset