LAMP: Extracting Text from Gradients with Language Model Priors

02/17/2022
by   Dimitar I. Dimitrov, et al.
0

Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our key insight is to model the prior probability of the text with an auxiliary language model, utilizing it to guide the search towards more natural text. Concretely, LAMP introduces a discrete text transformation procedure that minimizes both the reconstruction loss and the prior text probability, as provided by the auxiliary language model. The procedure is alternated with a continuous optimization of the reconstruction loss, which also regularizes the length of the reconstructed embeddings. Our experiments demonstrate that LAMP reconstructs the original text significantly more precisely than prior work: we recover 5x more bigrams and 23% longer subsequences on average. Moreover, we are first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

Recovering Private Text in Federated Learning of Language Models

Federated learning allows distributed users to collaboratively train a m...
research
10/31/2021

Revealing and Protecting Labels in Distributed Training

Distributed learning paradigms such as federated learning often involve ...
research
09/12/2022

Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Federated learning (FL) aims to perform privacy-preserving machine learn...
research
01/29/2022

Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

A central tenet of Federated learning (FL), which trains models without ...
research
04/15/2021

See through Gradients: Image Batch Recovery via GradInversion

Training deep neural networks requires gradient estimation from data bat...
research
10/25/2021

Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models

Federated learning has quickly gained popularity with its promises of in...
research
10/04/2022

Data Leakage in Tabular Federated Learning

While federated learning (FL) promises to preserve privacy in distribute...

Please sign up or login with your details

Forgot password? Click here to reset