A Watermark for Large Language Models

01/24/2023
by   John Kirchenbauer, et al.
3

Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2023

Robust Distortion-free Watermarks for Language Models

We propose a methodology for planting watermarks in text from an autoreg...
research
08/01/2023

Advancing Beyond Identification: Multi-bit Watermark for Language Models

This study aims to proactively tackle misuse of large language models be...
research
05/29/2023

Baselines for Identifying Watermarked Large Language Models

We consider the emerging problem of identifying the presence and use of ...
research
05/24/2023

KNN-LM Does Not Improve Open-ended Text Generation

In this paper, we study the generation quality of interpolation-based re...
research
06/07/2023

On the Reliability of Watermarks for Large Language Models

As LLMs become commonplace, machine-generated text has the potential to ...
research
07/25/2023

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

Chain-of-thought (CoT) prompting has been shown to empirically improve t...
research
06/06/2023

LLMZip: Lossless Text Compression using Large Language Models

We provide new estimates of an asymptotic upper bound on the entropy of ...

Please sign up or login with your details

Forgot password? Click here to reset