CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks

09/19/2022
by   Xuanli He, et al.
0

Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, a recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-hoc ownership verification on the imitation models. However, we find that it is possible to detect those watermarks via sufficient statistics of the frequencies of candidate watermarking words. To address this drawback, in this paper, we propose a novel Conditional wATERmarking framework (CATER) for protecting the IP of text generation APIs. An optimization method is proposed to decide the watermarking rules that can minimize the distortion of overall word distributions while maximizing the change of conditional word selections. Theoretically, we prove that it is infeasible for even the savviest attacker (they know how CATER works) to reveal the used watermarks from a large pool of potential word pairs based on statistical inspection. Empirically, we observe that high-order conditions lead to an exponential growth of suspicious (unused) watermarks, making our crafted watermarks more stealthy. In addition, can effectively identify the IP infringement under architectural mismatch and cross-domain imitation attacks, with negligible impairments on the generation quality of victim APIs. We envision our work as a milestone for stealthily protecting the IP of text generation APIs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2020

Improving Adversarial Text Generation by Modeling the Distant Future

Auto-regressive text generation models usually focus on local fluency, a...
research
12/05/2021

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Nowadays, due to the breakthrough in natural language generation (NLG), ...
research
02/06/2023

Protecting Language Generation Models via Invisible Watermarking

Language generation models have been an increasingly powerful enabler fo...
research
02/05/2019

Non-Monotonic Sequential Text Generation

Standard sequential generation methods assume a pre-specified generation...
research
11/10/2019

Pre-train and Plug-in: Flexible Conditional Text Generation with Variational Auto-Encoders

Current neural Natural Language Generation (NLG) models cannot handle em...
research
04/28/2023

Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation

While conditional generation models can now generate natural language we...
research
12/28/2018

The role of grammar in transition-probabilities of subsequent words in English text

Sentence formation is a highly structured, history-dependent, and sample...

Please sign up or login with your details

Forgot password? Click here to reset