Federated learning (FL) 
is a popular distributed learning approach that enables a number of devices to train a shared model in a federated fashion without transferring their local data. A central server coordinates the FL process, where each participating device communicates only the model parameters on the central server while keeping local data private. Thus, FL becomes a natural choice for developing mobile deep learning applications, such as next-word prediction, emoji prediction , etc.
Privacy preserving is the major motivation for proposing FL. However, recent works demonstrated that sharing model updates or gradients also makes FL vulnerable to inference attack, e.g., property inference attack  and model inversion attack [5, 28, 7, 26]. Here property inference attack infers sensitive properties of training data using the model updates and model inversion attack reconstructs training data using model gradients. However, the essential causes of such privacy leakages have not been thoroughly investigated or explained. Some defense strategies have been presented to prevent the privacy leakage and can be categorized into three types: differential privacy [21, 24, 9, 17, 8], secure multi-party computation [4, 19, 3, 18], and data compression . But these defensive approaches incur either significant computational overheads or unignorable accuracy loss. The reason is that existing defenses are not specifically designed for the privacy leakage from the communicated local updates. The privacy issues seriously hinder the development and deployment of FL. There is an urgent necessity to unveil the essential cause of privacy leakage such that we can develop effective defenses to tackle the privacy issue of FL.
In this work, we assume the server in FL is malicious, and it aims to reconstruct the private training data from devices. Our key observation is: the class-wise data representations of each device’s data are embedded in shared local model updates, and such data representations can be inferred to perform model inversion attacks. Therefore, the information can be severely leaked through the model updates. In particular, we provide an analysis to reveal how the data representations, e.g., in the fully connected (FC) layer, are embedded in the model updates. We then propose an algorithm to infer class-wise data representation to perform model inversion attacks. Our empirical study demonstrates that the correlation between the inferred data representations using our algorithm and the real data representations is as high as 0.99 during local training, and thus prove that the representations leakage is the essential cause behind existing attacks. Note that the data is often non-IID (identically and independently distributed) across the devices in FL. We also show that the non-IID characteristic aggravates the representation leakage.
Based on our observation of the representation leakage from local updates, we design a defense strategy. Specifically, we present an algorithm to generate a perturbation added to the data representation, such that: 1) the perturbed data representations are as similar as possible to the true data representations to maintain the FL performance; and 2) the reconstructed data using the perturbed data representations are as dissimilar as possible to the raw data. Importantly, we also derive certified robustness guarantee to FL and convergence guarantee to FedAvg, a popular FL algorithm, when applying our defense. To evaluate the effectiveness of our defense, we conduct experiments on MNIST and CIFAR10 for defending against the DLG attack  and GS attack . The results demonstrate that without sacrificing accuracy, our proposed defense can increase squared error (MSE) mean between the reconstructed data and the raw data for both DLG attack and GS attack by as many as more than 160, compared with baseline defense methods. The privacy of the FL system is significantly improved.
Our key contributions are summarized as follows:
To the best of our knowledge, this is the first work to explicitly reveal that data representations embedded in the model updates is the essential cause of leaking private information from the communicated local updates in FL. In addition, we develop an algorithm to effectively reconstruct the data from the local updates.
We develop an effective defense by perturbing data representations. We also derive certified robustness guarantee to FL and convergence guarantee to FedAvg, when applying our defense.
We empirically evaluate our defense on MNIST and CIFAR10 against DLG and GS attacks. The results show our defense can offer a significantly stronger privacy guarantee without sacrificing accuracy.
2 Related work
Privacy Leakage in Distributed Learning.
There exist several adversarial goals to infer private information: data reconstruction, class representative inference, membership inference, and attribute inference. Data reconstruction aims to recover training samples that are used by participating clients. The quality of the reconstructed samples can be assessed by comparing the similarity with the original data. Recently, Zhu et al.  present an algorithm named DLG to reconstruct training samples by optimizing the input to generate the same gradients for a particular client. Following up DLG, iDLG  is proposed to improve the efficiency and accuracy of DLG. Aono et al.  also show that an honest-but-curious server can partially reconstruct clients’ training inputs using their local updates. However, such an attack is applicable only when the batch consists of a single sample. Wang et al. 
present a reconstruction attack by incorporating a generative adversarial network (GAN) with a multi-task discriminator. But this method is only applicable to scenarios where data is mostly homogeneous across clients and auxiliary dataset is available. Several approaches have been proposed to infer class features or class representatives. Hitajet al.  demonstrate that an adversarial participant in the collaborative learning can utilize GANs to construct class representatives. However, this technique is evaluated only when all samples of the same class are virtually similar (e.g., handwritten digits, faces, etc.). Membership inference attack (MIA) is performed to accurately determine whether a given sample has been used for the training. This type of attack is first proposed by Shokriet al. 
, and it can be applied to any types of machine learning models even under black-box settings. Sablayrolleset al.  propose an optimal strategy for MIA under the assumption that model parameters conform to certain distributions. Nasr et al.  extend MIA to federated learning for quantifying the privacy leakage in the distributed setting. Attribute inference attack tries to identify some sensitive attributes of training data. There exist several techniques to perform this type of attack [2, 6, 5, 11]. Fredrikson et al.  propose a method to reveal genomic information of patients using model outputs and other non-sensitive attributes. More recently, Melis et al.  demonstrate that an adversarial client can infer attributes that hold only for a subset of the training data based on the exchanged model updates in federated learning.
Privacy-preserving Distributed Learning.
Existing privacy-preserving distributed learning methods can be categorized into three types: differential privacy (DP), secure multi-party computation (MPC), and data compression. Pathak et al.  present a distributed learning method to compose a deferentially private global model by aggregating locally trained models. Shokri et al. 
propose a collaborative learning method where the sparse vector is adopted to achieve DP. Hammet al.  design a distributed learning approach to train a deferentially private global model via transferring the knowledge of the local model ensemble. Recently, participant-level deferentially private federated learning are proposed [17, 8] via injecting Gaussian noise to local updates. However, these DP-based methods require a large number of participants in the training to converge and realize a desirable privacy-performance tradeoff. In addition, MPC has also been applied to develop privacy-preserving machine learning in a distributed fashion. For example, Danner et al.  propose a secure sum protocol using a tree topology. Another example of the MPC-based approach is SecureML , where participants distribute their private data among two non-colluding servers, and then the two servers use MPC to train a global model using the participants’ encrypted joint data. Bonawitz et al.  propose a secure multi-party aggregation method for FL, where participants are required to encrypt their local updates such that the central server can only recover the aggregation of the updates. However, these MPC-based approaches will incur unneglectable computational overhead. It is even worse that attackers can still successfully infer private information even if the adversary only observes the aggregated updates . Furthermore, Zhu et al.  show applying gradient compression and sparsification can help defend against privacy leakage from shared local updates. However, such approaches require a high compression rate to achieve a desirable defensive performance. In Section 6, given the same compression rate, we show that our proposed method can achieve better defense and inference performance than that of the gradient compression approach.
3 Essential Cause of Privacy Leakage in FL
Existing works [28, 27, 1, 26] demonstrate that information leakage is from communicated model updates between the devices and server during FL training. However, they do not provide a thorough explanation. To understand the essential cause of information leakage in FL, we analyze the privacy leakage in FL. Our key observation is that privacy leakage is essentially caused by the data representations embedded in the model updates.
3.1 Representation Leakage in FL
Problem setup. In FL, there are multiple client devices and a center server. The server coordinates the FL process, where each participating device communicates only the model parameters with the server while keeping their local data private. We assume the server is malicious and it only has access to the devices’ model parameters. The server’s purpose is to infer the devices’ data through the devices’ model parameters.
Key observations on representation leakage in FL: Data representations are less entangled. For simplicity, we use the fully connected (FC) layer as an instance and analyze how data representation is leaked in FL. We note that such an analysis can be naturally extended to other types of layers. Specifically, we denote a FC layer as , where is the input to the FC layer (i.e., the learnt data representation by previous layers), is the weight matrix, and is the output. Then, given a training batch , the gradient of the loss with respect to is:
where , , and are the loss corresponding to the sample, the input, and the output of the FC layer in this batch, respectively. We observe that the gradient for a particular sample is the product of a column vector and a row vector . Suppose the training data has labels. We can split the batch into sets, i.e., , where denotes the data samples with the -th label. Then, Eq. 1 can be rewritten as:
where represents the gradient with respect to the data samples in . Figure 1 illustrates the gradient updates for a batch data in a FC layer. We observe that for data coming from different classes, the corresponding data representations tend to be embedded in different rows of gradients. If the number of classes is large in a batch, which is common in centralized training, the representations of different classes will be entangled in the gradients of this whole batch. In contrast to centralized training, the local data often covers a small number of tasks on a participating device in FL. Thus, the number of data classes within one training batch may be very small compared to that of the centralized training. In this case, the entanglement of data representations from different classes can be significantly reduced. Such a low entanglement of data representations allows us to explicitly reconstruct the input data of each class from the gradients, because we can (almost) precisely locate the rows of data representations in the gradients.
Note that in the above analysis, we only consider a single batch during the FL training. In practice, FL is often trained with multiple batches. In this case, the data representations of different classes could be entangled, especially when the number of batches is large. However, in practical FL applications, the devices often have insufficient data. During FL training, the numbers of batches and local training epochs of each device are both small. In this case, the data representations could still be less entangled across classes through inspecting the gradient updates in Eq.2.
3.2 Inferring Class-wise Data Representations
We develop an algorithm to identify the training classes and infer the data representations of each class embedded in each FC layer from the model updates. The representation inference algorithm is in a back propagation fashion. Specifically, we first identify the classes using the gradients of the last () layer. We denote the gradients of the -th layer as . We notice that the gradient vector , which is the th row in , shows significantly larger magnitudes than gradient vectors in other rows if the data from -th class are involved in training. Then, we can infer the data representation of the -th class in the last layer, because it linearly scales . If data representation of the -th class in layer is inferred, we can use their element values to identify the corresponding row from the -th layer’s gradients, i.e., , which embeds the data representation of the -th class in the -th layer. In this way, we can iteratively infer data representations of the -th class in all FC layers. The inference process for one FC layer is illustrated in figure 2 and the details of our representation inference algorithm are described in Appendix A.
We conduct experiments on CIFAR10  to evaluate the effectiveness of our algorithm. We consider the practical non-IID settings in FL, and follow the 2-class & balanced configuration in  to construct non-IID datasets: 100 devices in total and 10 devices are randomly sampled to participate in training in each communication round. Each device holds 2 classes of data and each class has 20 samples.
As local training configurations can affect the performance of inferred representation. In this experiment, we vary the number of local training epochs and local batch size . We adopt SGD as the optimizer with a learning rate . The model architecture is shown in Appendix B. We also consider a baseline in the IID setting, where we set to be 1 and to be 32.
|Local Training Configurations||FC1||FC2||FC3|
|E=1, B=32 (IID)||0.48||0.31||0.18|
We use the correlation coefficient between the true representation and our inferred to quantify the effectiveness of our proposed algorithm. We calculate for each class on each participating device. We extract data representations from all the FC layers in each of 200 communication rounds between the devices and the server, and the average across all communication rounds and devices is shown in Table 2. As Table 2 presents, the correlation is as high as 0.99, indicating a serious representation leakage in FL. decreases with or a larger number of batches in one epoch and increases as goes lower, which validate our claim in Section 3.1. However, is still higher than 0.8 in almost all cases. We note that is much lower in the IID-setting. This is because each device has more classes of data for training than those in non-IID setting, making the representations entangled. Our results validate that the practical non-IID setting in FL dramatically worsens the representation leakage.
3.3 Unveiling Representation Leakage
In this section, we investigate whether the representation leakage is the essential cause of information leakage in FL. Particularly, we conduct experiments on CIFAR10 to reconstruct the input based on the existing DLG attack . DLG attack requires the gradient information, and we consider three different portions of the gradients: the whole model gradients (WG), the gradients of convolutional layers only (CLG), and inferred representations using our method (Rep). The experiment settings are presented in Appendix B. As figure 3 shows, only utilizing gradients of convolutional layers cannot successfully reconstruct the input data, but using the representation inferred by our method can reconstruct the input data as effectively as utilizing the whole gradients in terms of visual quality. This result validates that representation leakage is the essential cause of privacy leakage in FL.
4 Defense Design
4.1 Defense Formulation
Our aforementioned observation shows that the privacy leakage in FL mainly comes from the representation leakage (e.g., in the FC layer). In this section, we propose a defense against such privacy leakage. In particular, we propose to perturb the data representation in a single layer (e.g., a FC layer), which we call the defended layer, to satisfy the following two goals:
Goal 1: To reduce the privacy information leakage, the reconstructed input through the perturbed data representations and the raw input should be dissimilar.
Goal 2: To maintain the FL performance, the perturbed data representation and the true data representations without perturbation should be similar.
Let and represent the clean data representation and perturbed data representation on the defended layer, respectively. We also define and as the raw input and the reconstructed input via the perturbed data representation. To satisfy Goal 1, we require that the distance between and , in terms of norm, should be as large as possible; To satisfy Goal 2, we require that the distance between and , in terms of norm, should be bounded. Formally, we have the following constrained objective function with respect to :
where is a predeminted threshold. Note that depends on . Next, we design a solution to obtain and derive the certified robustness.
4.2 Defense Solution
Let be the feature extractor before the defended layer. Prior to obtaining our solution, we make the following assumption and use the inverse function theorem.
The inverse of , i.e., , exists on and , .
For and , .
Then, our object function can be reduced as follows:
Note that, with different choices of and , we have different defense solutions and thus have different defense effects. In this work, we set , i.e., we aim to maximize the MSE between the reconstructed input and the raw input. Meanwhile, we set due to two reasons: our defense has an analytical solution and is communication efficient. Specifically, our solution is to find the largest elements in the set . Moreover, the learnt perturbed representation is relatively sparse and thus improves the communication efficiency. Algorithm 1 details the solution to obtain the perturbed presentation with and . Algorithm 2 details the local training process with our defense on a local device.
4.3 Certified Robustness Guarantee
We define our certified robustness guarantee as the certified minimal distance (in terms of -norm) between the raw input and the reconstructed input. A larger defense bound indicates that our defense is more effective. Specifically, we have the following theorem on our defense bound:
Assuming Assumption 1 holds. Given a data input , its representation and any perturbed data representation , we have:
See our proof in Appendix C. ∎
5 Convergence Guarantee
In this section, we derive the convergence guarantee of FedAvg —the most popular FL algorithm, with our proposed defense. We first describe the FedAvg algorithm with our defense and then present our theorem on the convergence guarantee.
5.1 FedAvg with Our Defense
In classical FedAvg, the objective function is defined as:
where is the weight of the -th device, and . is the local objective in the -th device.
Equation 10 is solved via an iterative server-devices communication as follows: Suppose the server has learnt the global model in the -th round, and randomly selects devices
with replacement according to the sampling probabilitiesfor the next training round. Then FedAvg is performed as follows: First, the server sends the global model to all devices. Then, all devices set their local model to be , i.e., , and each device performs iterations of local updates. Specifically, for the -th iteration, the local model in the -th device applying our defense is updated as:
where is the learning rate and is a data sample uniformly chosen from the -th device. is our defense scheme. Finally, the server averages the local models of the selected devices and updates the global model as follows:
5.2 Convergence Analysis
Our convergence analysis is inspired by . Without loss of generality, we derive the convergence guarantee by applying our defense to a single layer. However, our results can be naturally generalized to multiple layers. We denote the input representation, parameters, and output of a single (e.g. -th) layer in the -th device and in the -th round as , and , respectively.
Before presenting our theoretical results, we first make the following Assumptions 2-5 same as  and an extra Assumption 6 on bounding the squared norm of stochastic gradients with respect to the single -th layer.
are L-smooth: , .
are -strongly convex: , .
Let be sampled from the -th device’s local data uniformly at random. The variance of stochastic gradients in each device is bounded:
-th device’s local data uniformly at random. The variance of stochastic gradients in each device is bounded:for .
The expected squared norm of stochastic gradients is uniformly bounded, i.e., for all and .
For the single -th layer, the squared norm of stochastic gradients on the output of each device is bounded: for all and .
We define and as the minimum value of and and let . We assume each device has local updates and the total number of iterations is . Then, we have the following convergence guarantee on FedAvg with our defense.
See our proof in Appendix D. ∎
6.1 Experimental Setup
In our experiments, we evaluate our defense against two different model inversion attacks under non-IID settings. Experiments are conducted on a server with two Intel Xeon E5-2687W CPUs and four Nvidia TITAN RTX GPUs.
We evaluate our defense method against two model inversion attacks in FL.
DLG attack  assumes that a malicious server aims to reconstruct devices’ data using their uploaded gradients. In DLG attack, the server optimizes reconstructed data to minimize the Euclidean distance between the raw gradients and the gradients that are generated by the reconstructed data in back propagation.
GC prunes gradients that are below a threshold magnitude, such that only a part of local updates will be communicated between devices and the server.
DP protects privacy with theoretical guarantee by injecting noise to the gradients uploaded to the server. In the experiments, we separately apply Gaussian and Laplacian noise to develop two DP baselines, i.e., DP-Gaussian and DP-Laplace.
To evaluate our defense under more realistic FL settings, we use MNIST and CIFAR10 datasets and construct non-IID datasets by following the configurations in . For each dataset, the data is distributed across 100 devices. Each device holds 2 random classes of data with 100 samples per class. By default, we perform training on CIFAR10 and MNIST non-IID dataset with 1000 and 200 communication rounds, respectively.
In training, we set local epoch as 1 and batch size as 32. We apply SGD optimizer and set the learning rate to 0.01. In each communication round, there are 10 devices which are randomly sampled to participate in the training. For model inversion attacks, the ideal case for the adversary is that there is only one sample in each batch, where the quality of reconstructed data will be very high . We evaluate our defense in such an extreme case, but it should show much better performance in other general cases (i.e., more than one sample in each batch). With regard to DLG attack, we apply optimizer and conduct 300 iterations of optimization to reconstruct the raw data. For GS attack, we utilize Adam optimizer with a learning rate of 0.1 and report the reconstructed results after 120 iterations. The base model architectures for two attacks are presented in Appendix B. For defense, the configurations of our method and the compared baselines are displayed in Table 2, where in GC stands for the pruning rate of the local models’ gradients, of our method represents the pruning rate of the the fully connected layer’s gradients. Regarding DP-Gaussian and DP-Laplace, we set the mean and variance of the noise distribution as 0 and , respectively.
|GC:||[1, 80]||[1, 90]|
|Ours:||[1, 40]||[1, 80]|
Privacy metric (MSE): We use the mean-square-error (MSE) between the reconstructed image and raw image to quantify the effectiveness of defenses. A smaller MSE indicates a server privacy information leakage.
Utility metric (Accuracy): We use the accuracy of the global model on the testing set to measure the effectiveness of FL algorithms (i.e., FedAvg ). A smaller accuracy means a less practical utility.
6.2 Defense Results: Utility-Privacy Tradeoff
We compare our defense with the baselines against the two attack methods in terms of model accuracy and MSE. Ideally, we want to maintain high model accuracy while achieving high MSE. The results are shown in figure 5.
We have the following two key observations. First, when achieving the MSE such that the reconstructed image is not recognizable by humans, our method shows no drop in accuracy while the other baselines sacrifice as high as and accuracy under the DLG and GS attacks, respectively.
Second, without sacrificing accuracy, our defense can achieve 160x MSE than the baseline defenses. The accuracy can be maintained by our defense until MSE being 0.8, while the baselines show significant accuracy drop with a much smaller MSE. The reason is two folded: 1) our defense does not perturb parameters in the feature extractor (i.e., convolutional layers), which preserves the descriptive power of the model; and 2) the representations embedded in the gradients that are pruned by our defense are mostly inference-irrelevant, and hence pruning these parameters would be less harmful to the global model performance.
To perceptually demonstrate the effectiveness of our defense, we also visualize the reconstructed images. We compare our defense with GC, which is the defense baseline that also utilizes pruning. To save the space, we only show the results using the GS attack but we have a similar observation in the DLG attack. Figure 6 shows the reconstructed image of a random sample in CIFAR10: the reconstructed image generated by our defense becomes unrecognizable when pruning only parameters in the FC layer. However, when applying the GC defense, the reconstructed image is still recognizable even when parameters of the whole model are pruned. Note that being unrecognizable to humans is not the ultimate goal of defense, as the private information might still reside in the image though the image is not perceptually recognizable . Nonetheless, a MSE higher than the threshold that makes the image recognizable still serves as a meaningful indicator of privacy defense.
6.3 Convergence Results
Following the experimental setup in 
, we use a logistic regression (LR) to examine our convergence results on FedAvg using our defense. We distribute the MNIST dataset amongdevices in a non-IID setting where each device contains samples of 2 digits. Here, in Eq. 4 is set to be 50, local batch size , local epoch , number of sampled devices in each communication round is selected from .
Figure 7 shows the results of loss vs. communication rounds. We observe that LR+FedAvg with our defense converges well, which validates our theoretical analysis.
7 Conclusions and Future Work
In this work, we present our key observation that the data representation leakage from gradients is the essential cause of privacy leakage in FL. We also provide an analysis of this observation to explain how the data presentation is leaked. Based on this observation, we propose a defense against model inversion attack in FL. This is done by perturbing data representation such that the quality of the reconstructed data is severely degraded, while FL performance is maintained. In addition, we derive certified robustness guarantee to FL and convergence guarantee to FedAvg—the most popular FL algorithm, when applying our defense. We conduct extensive experiments to evaluate the effectiveness of our defense, and the results demonstrate that our proposed defense can offer a much stronger privacy guarantee without sacrificing accuracy compared with baseline defenses.
Our further research include: 1) Investigating the impact of various -norm and -norm on both defense and accuracy, as well as designing norms that consider structural information in the data; 2) Extending our analysis of data representation leakage to other types of layers, e.g., convolutional layer, to have a more comprehensive understanding of privacy leakage in FL.
-  (2017) Privacy-preserving deep learning: revisited and enhanced. In International Conference on Applications and Techniques in Information Security, Cited by: §2, §3.
Hacking smart machines with smarter ones: how to extract meaningful data from machine learning classifiers. International Journal of Security and Networks. Cited by: §2.
-  (2017) Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1, §2.
-  (2015) Fully distributed privacy preserving mini-batch gradient descent learning. In IFIP International Conference on Distributed Applications and Interoperable Systems, Cited by: §1, §2.
-  (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1, §2.
-  (2014) Privacy in pharmacogenetics: an end-to-end case study of personalized warfarin dosing. In 23rd USENIX Security Symposium (USENIX Security 14), Cited by: §2.
-  (2020) Inverting gradients–how easy is it to break privacy in federated learning?. In Advances in Neural Information Processing Systems, Cited by: §1, §1, 2nd item.
-  (2017) Differentially private federated learning: a client level perspective. arXiv. Cited by: §1, §2.
-  (2016) Learning privately from multiparty data. In International Conference on Machine Learning, Cited by: §1, §2.
-  (2018) Federated learning for mobile keyboard prediction. arXiv. Cited by: §1.
-  (2017) Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §2.
-  (2019) Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems, Cited by: §6.2.
-  (2009) Learning multiple layers of features from tiny images. Cited by: §3.2.
-  (2020) Lotteryfl: personalized and communication-efficient federated learning with lottery ticket hypothesis on non-iid datasets. arXiv. Cited by: §3.2.
-  (2019) On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, Cited by: Appendix D, Appendix D, §5.2, §5.2, §6.3.
-  (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, Cited by: §1, §5, 2nd item, §6.1.
-  (2018) Learning differentially private recurrent language models. In International Conference on Learning Representations, Cited by: §1, §2, §6.1.
-  (2019) Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP), Cited by: §1, §2, §2.
-  (2017) Secureml: a system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP), Cited by: §1, §2.
-  (2019) Comprehensive privacy analysis of deep learning: passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP), Cited by: §2.
-  (2010) Multiparty differential privacy via aggregation of locally trained classifiers. In Advances in Neural Information Processing Systems, Cited by: §1, §2.
-  (2019) Federated learning for emoji prediction in a mobile keyboard. arXiv. Cited by: §1.
-  (2019) White-box vs black-box: bayes optimal strategies for membership inference. In International Conference on Machine Learning, Cited by: §2.
-  (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, Cited by: §1, §2.
-  (2017) Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), Cited by: §2.
-  (2019) Beyond inferring class representatives: user-level privacy leakage from federated learning. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Cited by: §1, §2, §3.
-  (2020) IDLG: improved deep leakage from gradients. arXiv. Cited by: §2, §3.
-  (2019) Deep leakage from gradients. In Advances in Neural Information Processing Systems, Cited by: §1, §1, §2, §2, §3.3, §3, 1st item, §6.1, §6.1.
Appendix A Method of Inferring Data Representations
As discussed in Section 3.1, if we can find several rows in the local update that is from , which is possible because of the low entanglement of across in FL, then we are able to infer this device’s training data representation of class in this layer. As and are both similar across in one batch , can be approximated as equation 14,
where and denote the average of and for , and is the data representation corresponding to this device’s training data of class in this layer. If we want to infer from this layer’s local parameter update, we need to seek out the unique elements in . Here, unique elements are the elements in that are not, or less entangled with other after summation in equation 2 is executed.
a.1 Inferring features in the last layer
Let us consider the last layer of a classification model with cross-entropy loss over a sample. Suppose is the data representation of the second-to-layer layer, we have
where is the loss defined on a sample and is the sample’s ground-truth label. denotes the output of the . Then in this layer is
As are probabilities, we have and . Hence, has only one negative element on index and the absolute value of is equal to the sum of other elements’ absolute values. Therefore, for the last layer, the unique element in is the ”peak” element with index , and this ”peak” element contributes to the larger , where denotes the row of .
When the malicious server receives one local model updates, it computes and picks out the ones that are significantly larger. Then the server successfully infer data classes on this device because these selected rows’ indexes corresponds to this device’s training data classes. For one training class , in this layer can just be approximated by , where is a scale influences by the local training steps. The algorithm of inferring data representations in the last layer is shown in Algorithm 3.
a.2 Inferring features in previous layers
Generally, we need to seek out the unique elements in to infer in this layer. Let us assume we have inferred the data representation of in the layer after, which is denoted as shown in figure 8. Specifically,
is the result of activation function with input as. If we can infer based on the access of , plus the inferred last layer’s data representation of , then we can infer
’s data representations of every linear layer in a backpropagation fashion.
Even though is a nonlinear transformation of , they share the similar structure and sparsity due to the consistency of most activation functions. Hence we can apply to approximate for seeking the unique elements in . Theoretically, corresponds to the direction of . Because should retain stable structure and sparsity in one local updating round as discussed in Section 3.1, should mostly appear on the elements with larger magnitude. Therefore, the unique elements in should have the same indexes with the elements with larger magnitude in . Since we have access to , we can find most unique elements in by listing the elements in with the largest magnitude. Then we can infer easily by fetching and averaging the rows of this layer’s weight updates according to the unique elements indexes.
Following the above algorithm, the malicious server can fetch the training data representation in a fully connected layer for each data class on one device based on the data representation in the layer after. Plus the inference of all classes’ training data representations in the last layer, the server is able to infer one device’s training data representations for each class it owns in every fully connected layers in a back propagation way. The inferring process is shown in Algorithm 4.
Appendix B Experiment Setup
Model for experiments in Section 3.2.
For the inferring class-wise data representation experiment, we use the base model with 2 convolutional layers and 3 fully connected layers. The detailed architecture is listed as Conv3-6MaxpoolConv6-16MaxpoolFC–120FC–84FC–10
. We set kernel size as 5 and 2 for all convolutional layers and max pooling layers respectively.
Settings for experiments in Section 3.3.
For experiments unveiling representation leakage in Section 3.3, we build a model with one convolutional layer and one fully connected layer. The detailed architecture is listed as Conv3-12FC–10. We set kernel size of the convolutional layer as 5. For attacks, we apply the L-BFGS optimizer and conduct 300 iterations of optimization to reconstruct the raw data.
Models for two attacks in Section 6
We use LeNet for both the DLG attack and ConvNet for GS attack. The architectures are shown in Table 3.
|5 5 Conv 3-12||5 5 Conv 3-32|
|5 5 Conv 12-12||5 5 Conv 32-64|
|5 5 Conv 12-12||5 5 Conv 64-64|
|5 5 Conv 12-12||5 5 Conv 64-128|
|FC–10||5 5 Conv 128-128|
|5 5 Conv 128-128|
|3 3 Maxpool|
|5 5 Conv 128-128|
|5 5 Conv 128-128|
|5 5 Conv 128-128|
|3 3 Maxpool|
Appendix C Proof of Theorem 1
Let be a sub-multiplicative norm. .
Based on Proposition 1, we have . Then, is lower bounded as
Appendix D Proof of Theorem 2
Overview: Our proof is mainly inspired by . Specifically, our proof has two key parts. First, we derive the bounds similar to those in Assumptions 4 and 5, after applying our defense scheme. Second, we adapt Theorem 2 on convergence guarantee in  using our new bounds.
Bounding the expected distance between the perturbed gradients with our defense and raw gradients using Assumption 6. In FedAvg, in the -th round, we denote the input representation, parameters, and output of the single -th layer in the -th device as , , and , respectively. Via applying our defense scheme , the input representation is perturbed as . Then, the expected distance between the perturbed gradients and raw gradients in the -th layer is bounded by:
New bounds for Assumption 4 with our defense. Note that our defense scheme is only applied to the -th layer. Then, the distance between the perturbed gradients and the raw gradients of the whole model is the same as that of the -th layer. Thus,
Next, we use the norm triangle inequality to bound he variance of stochastic gradients in each device, and we have
New bounds for Assumption 5 with our defense. The expected squared norm of stochastic gradients with our defense is as follows: