On the Reliability of Watermarks for Large Language Models

06/07/2023
by   John Kirchenbauer, et al.
5

As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a 1e-5 false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.

READ FULL TEXT

page 7

page 26

page 28

page 32

research
12/24/2022

Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text

As text generated by large language models proliferates, it becomes vita...
research
01/24/2023

A Watermark for Large Language Models

Potential harms of large language models can be mitigated by watermarkin...
research
10/06/2020

RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text

In recent years, large neural networks for natural language generation (...
research
04/11/2023

Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

Large language models (LLMs) have gained popularity in various fields fo...
research
06/30/2023

Provable Robust Watermarking for AI-Generated Text

As AI-generated text increasingly resembles human-written content, the a...
research
02/19/2020

Attacking Neural Text Detectors

Machine learning based language models have recently made significant pr...
research
10/17/2022

Deepfake Text Detection: Limitations and Opportunities

Recent advances in generative models for language have enabled the creat...

Please sign up or login with your details

Forgot password? Click here to reset