Ghostbuster: Detecting Text Ghostwritten by Large Language Models

05/24/2023
by   Vivek Verma, et al.
0

We introduce Ghostbuster, a state-of-the-art system for detecting AI-generated text. Our method works by passing documents through a series of weaker language models and running a structured search over possible combinations of their features, then training a classifier on the selected features to determine if the target document was AI-generated. Crucially, Ghostbuster does not require access to token probabilities from the target model, making it useful for detecting text generated by black-box models or unknown model versions. In conjunction with our model, we release three new datasets of human and AI-generated text as detection benchmarks that cover multiple domains (student essays, creative fiction, and news) and task setups: document-level detection, author identification, and a challenge task of paragraph-level detection. Ghostbuster averages 99.1 F1 across all three datasets on document-level detection, outperforming previous approaches such as GPTZero and DetectGPT by up to 32.7 F1.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Generative AI Text Classification using Ensemble LLM Approaches

Large Language Models (LLMs) have shown impressive performance across a ...
research
05/17/2023

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

With the advent of fluent generative language models that can produce co...
research
05/24/2023

LLMDet: A Large Language Models Detection Tool

With the advancement of generative language models, the generated text h...
research
04/10/2023

On the Possibilities of AI-Generated Text Detection

Our work focuses on the challenge of detecting outputs generated by Larg...
research
05/27/2023

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Large language models (LLMs) have notably enhanced the fluency and diver...
research
11/04/2021

Unsupervised and Distributional Detection of Machine-Generated Text

The power of natural language generation models has provoked a flurry of...
research
05/25/2023

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Large language models (large LMs) are susceptible to producing text with...

Please sign up or login with your details

Forgot password? Click here to reset