DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

05/27/2023
by   Xianjun Yang, et al.
0

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we can clearly illustrate significant discrepancies between machine-generated and human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.

READ FULL TEXT
research
06/05/2023

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

Large language models (LLMs) are instruction followers, but it can be ch...
research
01/26/2023

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

The fluency and factual knowledge of large language models (LLMs) height...
research
05/17/2023

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

With the advent of fluent generative language models that can produce co...
research
05/24/2023

Ghostbuster: Detecting Text Ghostwritten by Large Language Models

We introduce Ghostbuster, a state-of-the-art system for detecting AI-gen...
research
06/10/2019

GLTR: Statistical Detection and Visualization of Generated Text

The rapid improvement of language models has raised the specter of abuse...
research
05/14/2023

Watermarking Text Generated by Black-Box Language Models

LLMs now exhibit human-like skills in various fields, leading to worries...
research
06/09/2023

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

Recent advances in natural language processing (NLP) have led to the dev...

Please sign up or login with your details

Forgot password? Click here to reset