Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

by   Kalpesh Krishna, et al.

To detect the deployment of large language models for malicious use cases (e.g., fake content creation or academic plagiarism), several approaches have recently been proposed for identifying AI-generated text via watermarks or statistical irregularities. How robust are these detection algorithms to paraphrases of AI-generated text? To stress test these detectors, we first train an 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, optionally leveraging surrounding text (e.g., user-written prompts) as context. DIPPER also uses scalar knobs to control the amount of lexical diversity and reordering in the paraphrases. Paraphrasing text generated by three large language models (including GPT3.5-davinci-003) with DIPPER successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops the detection accuracy of DetectGPT from 70.3 positive rate of 1 increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. Given a candidate text, our algorithm searches a database of sequences previously generated by the API, looking for sequences that match the candidate text within a certain threshold. We empirically verify our defense using a database of 15M generations from a fine-tuned T5-XXL model and find that it can detect 80 classifying 1 our code, model and data for future research.


page 1

page 2

page 3

page 4


Red Teaming Language Model Detectors with Language Models

The prevalence and high capacity of large language models (LLMs) present...

Large Language Models can be Guided to Evade AI-Generated Text Detection

Large Language Models (LLMs) have demonstrated exceptional performance i...

On the Possibilities of AI-Generated Text Detection

Our work focuses on the challenge of detecting outputs generated by Larg...

Provable Robust Watermarking for AI-Generated Text

As AI-generated text increasingly resembles human-written content, the a...

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

Recent advances in natural language processing (NLP) have led to the dev...

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are aston...

ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models

AI generated content (AIGC) presents considerable challenge to educators...

Please sign up or login with your details

Forgot password? Click here to reset