Recovering Private Text in Federated Learning of Language Models

05/17/2022
by   Samyak Gupta, et al.
6

Federated learning allows distributed users to collaboratively train a model while keeping each user's data private. Recently, a growing body of work has demonstrated that an eavesdropping attacker can effectively recover image data from gradients transmitted during federated learning. However, little progress has been made in recovering text data. In this paper, we present a novel attack method FILM for federated learning of language models – for the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences. Different from image-recovery methods which are optimized to match gradients, we take a distinct approach that first identifies a set of words from gradients and then directly reconstructs sentences based on beam search and a prior-based reordering strategy. The key insight of our attack is to leverage either prior knowledge in pre-trained language models or memorization during training. Despite its simplicity, we demonstrate that FILM can work well with several large-scale datasets – it can extract single sentences with high fidelity even for large batch sizes and recover multiple sentences from the batch successfully if the attack is applied iteratively. We hope our results can motivate future work in developing stronger attacks as well as new defense methods for training language models in federated learning. Our code is publicly available at https://github.com/Princeton-SysML/FILM.

READ FULL TEXT
research
10/26/2021

CAFE: Catastrophic Data Leakage in Vertical Federated Learning

Recent studies show that private training data can be leaked through the...
research
10/12/2020

TextHide: Tackling Data Privacy in Language Understanding Tasks

An unsolved challenge in distributed or federated learning is to effecti...
research
02/17/2022

LAMP: Extracting Text from Gradients with Language Model Priors

Recent work shows that sensitive user data can be reconstructed from gra...
research
04/15/2021

See through Gradients: Image Batch Recovery via GradInversion

Training deep neural networks requires gradient estimation from data bat...
research
10/30/2022

Two Models are Better than One: Federated Learning Is Not Private For Google GBoard Next Word Prediction

In this paper we present new attacks against federated learning when use...
research
06/05/2023

Hiding in Plain Sight: Disguising Data Stealing Attacks in Federated Learning

Malicious server (MS) attacks have enabled the scaling of data stealing ...
research
09/12/2022

Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Federated learning (FL) aims to perform privacy-preserving machine learn...

Please sign up or login with your details

Forgot password? Click here to reset