Log In Sign Up

LAMP: Extracting Text from Gradients with Language Model Priors

by   Dimitar I. Dimitrov, et al.

Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our key insight is to model the prior probability of the text with an auxiliary language model, utilizing it to guide the search towards more natural text. Concretely, LAMP introduces a discrete text transformation procedure that minimizes both the reconstruction loss and the prior text probability, as provided by the auxiliary language model. The procedure is alternated with a continuous optimization of the reconstruction loss, which also regularizes the length of the reconstructed embeddings. Our experiments demonstrate that LAMP reconstructs the original text significantly more precisely than prior work: we recover 5x more bigrams and 23% longer subsequences on average. Moreover, we are first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought.


page 1

page 2

page 3

page 4


Recovering Private Text in Federated Learning of Language Models

Federated learning allows distributed users to collaboratively train a m...

Revealing and Protecting Labels in Distributed Training

Distributed learning paradigms such as federated learning often involve ...

Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

A central tenet of Federated learning (FL), which trains models without ...

See through Gradients: Image Batch Recovery via GradInversion

Training deep neural networks requires gradient estimation from data bat...

Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models

Federated learning has quickly gained popularity with its promises of in...

Data Leakage in Tabular Federated Learning

While federated learning (FL) promises to preserve privacy in distribute...