Gain without Pain: Offsetting DP-injected Nosies Stealthily in Cross-device Federated Learning

01/31/2021 ∙ by Wenzhuo Yang, et al. ∙ Tsinghua University Macquarie University SUN YAT-SEN UNIVERSITY 0

Federated Learning (FL) is an emerging paradigm through which decentralized devices can collaboratively train a common model. However, a serious concern is the leakage of privacy from exchanged gradient information between clients and the parameter server (PS) in FL. To protect gradient information, clients can adopt differential privacy (DP) to add additional noises and distort original gradients before they are uploaded to the PS. Nevertheless, the model accuracy will be significantly impaired by DP noises, making DP impracticable in real systems. In this work, we propose a novel Noise Information Secretly Sharing (NISS) algorithm to alleviate the disturbance of DP noises by sharing negated noises among clients. We theoretically prove that: 1) If clients are trustworthy, DP noises can be perfectly offset on the PS; 2) Clients can easily distort negated DP noises to protect themselves in case that other clients are not totally trustworthy, though the cost lowers model accuracy. NISS is particularly applicable for FL across multiple IoT (Internet of Things) systems, in which all IoT devices need to collaboratively train a model. To verify the effectiveness and the superiority of the NISS algorithm, we conduct experiments with the MNIST and CIFAR-10 datasets. The experiment results verify our analysis and demonstrate that NISS can improve model accuracy by 21 average and obtain better privacy protection if clients are trustworthy.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the remarkable development of IoT (Internet of Things) systems, IoT devices such as mobile phones, cameras and IIoT (Industrial IoT) devices have been widely deployed in our daily life [4][10]. On one hand, IoT devices with powerful computing and communication capacity are generating more and more data. On the other hand, to provide more intelligent services, decentralized IoT devices have motivation to collaborate via federated learning (FL) so that distributed data can be fully exploited for model training [33][16].

The training process via FL can be briefly described as follows. In a typical FL system, a parameter server (PS) is deployed to aggregate gradients uploaded by clients, and distribute aggregated results back to clients [13, 22, 14, 15]. The model training process terminates after exchanging the gradient information between clients and the PS for a certain number of rounds. However, it has been studied in [23, 12, 38, 34] that it can lead to the leakage of user privacy if the gradient information is disclosed. In addition, the PS is not always trustworthy [17, 13], which also possibly invades user privacy.

Recently, it has been extensively investigated by academia and industry to adopt differential privacy (DP) on each client [29, 8, 1, 30, 9] so as to protect the gradient information. DP can distort original gradients by adding additional noises, which however unavoidably distorts the aggregated gradients on the PS and hence impairs the model accuracy [29]. It has been reported in the work [29, 38] that DP noises can significantly lower model accuracy by . It implies that straightly implementing DP in real systems is impracticable when high model accuracy is required [1].

Fig. 1: A case with clients whose noises can be perfectly offset among themselves. Here represents the parameter for client .

To alleviate the disturbance of DP noises on the aggregated gradients without compromising user privacy, we propose an algorithm to secretly offset DP noises. The idea of our work can be explained by the example shown in Fig. 1. There are three clients and the model to be trained by these clients is represented by the parameter for client . Each client distorts the original gradient information by adding a random number, as presented in Fig. 1

(a). We suppose that the random number is generated according to the Gaussian distribution determined by the client’s privacy budget. However, the noise can be offset if a client can negate and split its noise into multiple shares, and distribute these negated shares with other clients, as presented in Fig. 

1(b). Each client uploads its gradients plus all noises (i.e., its own noises and negated noise shares from other clients) to the PS, and then these noises can be perfectly offset among themselves.

Inspired by this example, we propose the Noise Information Secretly Sharing (NISS) algorithm through which clients can secretly share their noise information with each other. We theoretically prove that: 1) If clients are trustworthy, DP noises can be perfectly offset on the PS without compromising privacy protection; 2) Clients can easily distort negated noise shares received from other clients in case that other clients are not totally trustworthy. We also investigate the extreme case that the PS colludes with other clients to crack the gradient information of a particular client. In this extreme case, there is a trade-off between model accuracy and privacy protection, and model accuracy cannot be improved without compromising privacy protection. However, we would like to emphasize that NISS is particularly applicable for FL across multiple IoT systems. IoT devices within the same system can trust each other to certain extent so that the model accuracy can be improved accordingly. Besides, devices within the same system can be connected with high speed networks so that the communication overhead caused by NISS is acceptable.

Our main contributions are summarized as below:

  • We propose the NISS algorithm that can secretly offset noises generated by DP adopted by each client so that the disturbance on the aggregated gradients can be removed.

  • We theoretically prove that the DP noises can be perfectly offset if clients are trustworthy. Even if clients are not totally trustworthy, clients can still protect themselves by distorting the negated noise shares transmitted between clients.

  • At last, we conduct experiments with the MNIST and CIFAR-10 datasets, and the experiment results demonstrate that NISS algorithm can obtain better privacy protection and higher accuracy.

The reminder of this paper is organized as follows. In Section II, we introduce relate work on FL, DP, and SMC. In Section III, we introduce the preliminary knowledge. In Section IV, we elaborate the NISS algorithm. In Section V, we present the analysis of noise offsetting and security. In Section VI, we show the simulations, compare our scheme with other schemes and discuss the experimental results. Finally, we conclude the paper in Section VII.

Ii Related work

Ii-a Federated Learning (FL)

FL, as a recent advance of distributed machine learning, empowers participants to collaboratively train a model under the orchestration of a central parameter server, while keeping the training data decentralized

[13]. It was first proposed by Google in 2016 [22]. During the training process, each participant’s raw data is stored locally and will not be exchanged or transferred for training. FL has the advantages of making full use of IoT computing power with preserved user privacy.

The work in [22] firstly proposed FedAVG, which is one of the most widely used model average algorithms in FL. The work in [18] analyzed the convergence rate of FedAVG with non-IID data simple distributions. The work [13] and [17] showed a comprehensive introduction to the history, technical methods and unresolved problems in FL. The work in [9] proved that the bare FedAVG can protect the privacy of participants to some extent. However only exchanging gradients information still has a high risk of privacy leakage [23, 12, 38, 34]. Despite tremendous efforts contributed by prior works, there exist many issues in FL that have not been solved very well, such as inefficient communication and device variability [17, 32, 29].

Ii-B Differential Privacy (DP)

DP is a very effective mechanism for privacy preservation that can be applied in FL [29, 1, 9, 30]. It uses a mechanism to generate random noises that are added to query results so as to distort original values.

The most commonly used mechanism for adding noises to FL is the Gaussian mechanism. The work in [1] investigated how to apply Gaussian mechanism in machine learning systems. Then the work in [9] studied how to use the Gaussian mechanism in FL. In [29], a FedSGD with DP algorithm is proposed for FL systems and its convergence rate is analyzed. The work in [38] introduced a novel method named DLG to measure the level of privacy preservation in FL. In FL with DP, a higher

implies a smaller variance of DP noises, and hence a lower level of privacy preservation. Model accuracy can be largely affected by DP noises

[29, 38].

In the field of IoT, FL with DP has also attracted a lot of attention recently. In [6], the author surveys a wide variety of papers on privacy preserving methods that are crucial for FL in IoT. The work in [35] designed a FL system with DP leveraging the reputation mechanism to assist home appliance manufacturers to train a machine learning model based on customers’ data. The work in [36] proposed to integrate FL and DP to facilitate crowdsourcing applications to generate a machine learning model.

Basically, there is a trade-off between the extent of privacy protection and model accuracy if DP is straightly incorporated into FL. Different from these works, we devise a novel algorithm through which clients can generate negatively correlated DP noises to get rid of the negative influence on model accuracy.

Ii-C Secure Multi-party Computing (SMC)

Other than DP, SMC is another effective way for privacy preservation in FL. In previous studies, SMC has been used in many machine learning models [2, 7, 26, 27]. At present, Secret Sharing (SS) and Homomorphic Encryption (HE) are two main ways in SMC to protect privacy in FL.

HE performs complicated computation operations on gradients. During the gradient aggregation and transmission, it is always calculated in an independent encryption space, instead of directly using the raw gradients value [11, 3]. SS is a method to generate several shares for a secret and send them to several participants. As long as most of participants are present, the secret can be recovered. In FL, participants can add masks to their gradients and share their masks as a secret to others. If the PS can receive returns from a sufficient number of participants, the masks can be eliminated. Several works based on SS in FL have been proposed in [5, 21, 31].

However, SS and HE consume too much computing resources, which prohibit their deployment in real world [28]. In fact, our work is a combination of SS and DP, but the computation overhead of our noise sharing scheme is very low.

Iii Preliminaries

To facilitate the understanding of our algorithm, the list of main notations used in our work is presented in Table I.

Symbol Meaning
The number of clients
The index of clients
The index of global training round
The number of local training round
The learning rate
The dimension of the parameters

The loss function

The gradient of function
The number of clients in each global round
The cardinality of
The aggregation weight of client
The unit noise variance
The Gaussian noise variance of client
Gaussian Distribution
The dataset of client
The client set of client in round t
The global model parameters
The local model parameters of client
The noise generated by DP mechanism
The negated noise

A random variables to distort

The variance of
The identity matrix
DP parameters

Iii-a Differential Privacy

It was assumed that user privacy will not be leaked if only gradient information is disclosed. However, it was investigated in [23, 12, 38, 34] that privacy information can be reconstructed through gradient information. Therefore, it was proposed in [1] that clients can adopt DP to further disturb their gradient information by adding additional noises to their disclosed information. According to the prior work [8], an algorithm satisfying -differential privacy is defined as follows.

Definition 1.

A randomized mechanism with domain and range satisfies -differentially privacy if for any two adjacent databases and for any subset of outputs ,


Here, is the privacy budget which is the distinguishable bound of all outputs on adjacent databases and .

represents the probabilities that two adjacent outputs of the databases

cannot be bounded by after using Algorithm . is also called the privacy budget. Intuitively, a DP mechanism with a smaller privacy budget has a stronger privacy protection and vice verse.

Theorem 1.

(Gaussian Mechanism). Let be arbitrary and denote the database. For , the Gaussian Mechanism with parameter is -differentially private. Here, represents the original output and is the sensitivity of given by .

For detailed proof, please refer to the reference [8].

We assume that the Gaussian mechanism is adopted in our work because it is convenient to split DP noises obeying the Gaussian distribution into multiple shares [1].


FedAVG is the most commonly used model average algorithm in FL, and thereby FedAVG is used for our study. Based on previous works [22, 30, 9, 29], we present the client based DP-FedAVG here to ease our following discussion.

Without loss of generality, we assume that there are clients. The client owns a private dataset with cardinality

. These clients target to train a model with parameters represented by the vector

. In FedAVG, clients need to exchange model parameters for multiple rounds with the PS. Each round is also called a global iteration. At the beginning of global round , each participating client receives the global parameters from the PS to conduct a number of local iterations. Then, clients return their locally updated model parameters plus DP noises to the PS. By receiving the computation results from a certain number of clients, the PS aggregates received parameters and embarks a new round of global iteration. The detail of the DP-FedAVG algorithm is presented in Algorithm 1.

PS executes:
Initialize ;
for each round  do
       (Random set of clients)
       for each client in parallel  do
             ClientUpdate(, )
(split into batches of size )
for each local round from to  do
       for batch  do
(Gaussian Mechanism)
Algorithm 1 DP-FedAVG Algorithm

In Algorithm 1, is the fraction of clients that participate each global iteration, is the Gaussian noise and is the aggregation weight of client , is the set of clients that participate in round . Usually, . is the set of local sample batches, is the number of local iterations to be conducted and is the learning rate.

Let represent the function returning the locally updated parameters with input and . The sensitivity of is denoted by . We assume that the privacy budget of client is represented by and .

Corollary 1.

Algorithm 1 satisfies -differentially private, if is sampled from where , and is the identity matrix.

Here is the model dimension. The proof is straightforward from Theorem 1.

According to Algorithm 1, the disturbance of the DP noises on the aggregated parameters is


From the right hand side of Eq.(2), we can see that the first term represents the aggregated parameters while the second term represents the disturbance of the DP noises. They are independently generated by all participating clients, and therefore the variance of is . Apparently, if the privacy budget is smaller, is higher and the total variance on the server side is higher. Our approach is to make these noises negatively correlated so that the aggregated noise variance can be reduced.

Iv NISS Algorithm

In this section, we introduce the NISS algorithm in details and leave the analysis of the reduced variance on the aggregation and the security analysis of NISS in the next section.

Iv-a Illustrative Example

Before diving into the detailed NISS algorithm, we present a concrete example to illustrate how NISS works. According to Algorithm 1, is sampled from since the dimension of is . It means that noises of dimensions are generated independently and the noise offset is conducted for each dimension independently. Thus, to simplify our discussion, we only need to consider the noise for a particular dimension of client , and is sampled from .

According to the property of the Gaussian distribution, can be split into shares and each share is sampled from . The client can send out negated share to neighboring clients. If all clients conduct the same operation, the client is expected to receive noise shares from other clients as well, which can be denoted as . To ease our understanding, the process is illustrated in Fig.2.

(a) A client sends out negated noise shares.
(b) A client receives noise shares from other clients.
Fig. 2: The workflow of NISS for a particular client.

Then, the client adds both its own noise and the sum of negated noise shares received from other clients to its parameter before it submits to the PS. The can preserve the privacy of client while can be used to offset the noises generated by other clients by the PS. Since negated shares are generated randomly, no other client can exactly obtain the noise information of client . In addition, the parameter information is only disclosed to the PS. As long as all other clients are trustworthy, these DP noises can be offset perfectly by negated noise shares.

PS executes:
Initialize ;
for each round  do
       (Random set of clients)
       for each client in parallel  do
(split into batches of size )
for each local round from to  do
       for batch  do
return .  
(Gaussian mechanism)
(Client ’s setting)
(Random set of clients in this round)
for  do
       (Connect with -th client in )
       Send to
       Receive from
Algorithm 2 NISS Algorithm

Iv-B Algorithm Design

We proceed to design the NISS algorithm based on the FedAVG algorithm introduced in the last section.

First of all, a tracker server is needed so that clients can send and receive negated noise shares with each other. Each client needs to contact the tracker server to fetch a list of neighbor clients before it sends out negated noise shares. The tracker server is only responsible for recording live clients in the system and returning a random list of clients as neighbors for a particular client . Obviously, the tracker server does not receive any noise information, and hence will not intrude user privacy. It can be implemented with light communication cost, similar to the deployment of the tracker server in peer-to-peer file sharing systems [25].

In NISS, the operation of the PS is the same as that in FedAVG. The only difference lies in the operation of each client. Based on its own privacy budget and function sensitivity, the client needs to determine so that satisfies -differentially privacy. Then, the client can determine the number of noise shares according to so that the client can generate noise shares and negated noise shares. Here is a number much smaller than and can be a common value used by all clients. is also called a unit noise.

Because clients disclose their noise information with other clients, the gradient information can be cracked to certain extent if some clients are not trustworthy and collude with the PS to intrude the privacy of a particular client. To prevent the leakage of user privacy, we propose to multiply a noise component to the received negated noise share . is also sampled from the Gaussian distribution . Due to the disturbance of , no other client and the PS can exactly crack the gradient information of the client. can be set according to the probability that other clients will collude with the PS. How to exactly set and the role of will be further analyzed in the next section.

By wrapping up, the details of the NISS algorithm is presented in Algorithm 2.

V Theoretic Analysis

In this section, we conduct analysis to show how much noise variance can be reduced by NISS on the PS side and how the NISS algorithm defends against attacks. Based on our analysis, we also discuss the application of NISS in real FL systems.

V-a Analysis of Noise Offsetting

Similar to Sec.IV-A, to simplify our discussion, we only consider the noise offsetting for a particular dimension. Let and denote noise shares and negated noise shares received from other clients for client respectively. Let denote the client that receive the -th negated noise share from client .

Based on Algorithm 2, the client uploads . The aggregation conducted on the PS becomes

Here is sampled from and is sampled from . is the abbreviation of if its meaning is clear from the context. Let denote the aggregated DP noises and our study focuses on the minimization of .

Let us first analyze the variance of a particular noise share after offsetting.

Lemma 1.

The variance of a noise share plus its negated share is:


Here client receives the negated share of and is the noise imposed by client .


According to the definition of the variance, we can obtain:


The above formula holds because is sampled from and is sampled from . So we can obtain and similarly. Apparently, and are dependent according to our algorithm. ∎

Theorem 2.

After noise offsetting, the variance of the aggregated noise on the PS side is:


According to Lemma 1, because each and are dependent, we can obtain:


Since client will send and receive noise shares and negated noise shares, thus each will be added times according to Algorithm 2. By substituting by , we can obtain:


The third equality holds because . ∎

Remark: From Theorem 2, we can observe that if implying that DP noises are perfectly offset on the PS side. However, if , the value of is the same as that without any noise offsetting. The value of depends on the trustworthy between clients. We will further discuss how to set after the security analysis in the next subsection.

V-B Security Analysis

We conduct the security analysis through analyzing the privacy preservation for a particular client. We suppose that the target of a particular client is to satisfy the -differentially private.

It is easy to understand that the NISS algorithm satisfies the -differentially private by setting or , if the PS and clients do not collude. What client submits to the PS is . The noise is also a Gaussian random variable with variance , and hence the NISS algorithm on client satisfies -differentially private. Meanwhile, no other client can crack the parameter information since the parameter information is only disclosed to the PS.

However, it is not guaranteed that the PS never colludes with clients. To conduct more general analysis, we assume that there is fraction of other clients will collude with the PS. The problem is how to set so that the NISS algorithm on client can still satisfies -differentially private.

Let represent the set of clients that client will contact. There is no prior knowledge about which client will collude with the PS. The tracker server randomly select clients for . It implies that fraction of will disclose the noise share information with the PS. We use to denote the clients who collude with the PS and to denote the clients who do not collude. Apparently, the size of and are and . Thus, the effective noise uploaded by client becomes . To ensure that -differentially private can be satisfied, it requires . It turns out that

Theorem 3.

If sampled from can make satisfy -differentially private, the NISS algorithm satisfies -differentially private as long as . Here represents the percentage of other clients that collude with the PS.

The detailed proof is presented in Appendix A.

Remark: It is worth to mention the special case with . According to Theorem 3, and is sampled from if . In this case,

. According to the central limit theorem, as long as

, we have . Thus, if and , it implies that and . The variance of the aggregated noise on the PS is the same as that without any offsetting operation. In this extreme case, there exists a trade-off between model accuracy and privacy protection. One cannot improve the model accuracy without compromising privacy protection.

V-C Application of NISS in Practice

As we have discussed in the last section, is a vital parameter. Our analysis uses to cover all cases with different fractions of malicious clients colluding with the PS. If is close to , it will significantly impair the performance of the NISS algorithm. In practice, can be set as a small value, which can be illustrated from two perspective.

Firstly, most FL systems are of a large-scale with tens of thousands of clients. If there are more normal clients, the fraction of malicious clients that will collude with the PS will be a smaller value. Secondly, our analysis is based on the assumption that is randomly selected by the tracker server. In fact, clients can play coalitional game with other clients they trust. For instance, the IoT devices of the same system can trust each other substantially. They can share negated noise information with each other by setting a small since the probability that neighboring clients collude with the PS is very low. From this example, we can also conclude that the NISS algorithm is particularly applicable for FL across multiple IoT systems. IoT devices in the sample system can form a coalition so that the variance of the aggregated noise is minimized. Besides, devices within the same system can be connected with high speed networks so that the communication overhead to transmit noise shares is insignificant.

Vi Experiment

Fig. 3: . If , FedAVG. If , DP-FedAVG. For the rest, NISS with and .

In this section, we conduct experiments with MNIST and CIFAR-10 to evaluate the performance of NISS.

Vi-a Experimental Setup

Vi-A1 Simulation Settings

Based on [1], we use the Gaussian mechanism to add noises to local model parameters. We use the same experimental settings as in [22], [29] and [1]. The FL settings of our experiment are as follows:

  • Number of users:

  • User fraction rate:

  • Local minibatch size:

  • Learning rate:

  • Number of local round:

  • Unit noise variance:

  • DP parameters:

In addition, to achieve DP-FedAVG, we use the norm clipping technique with a clipping threshold to restrict the range of client’s gradients. If a client’s some gradient exceeds , it will be clipped to . The details of its mechanism can be found in [29]. For our experiments, we set .

We use Pytorch

[24] as our experimental environment. The experiments run on the computer with a processor with six 2.6GHz Intel i7 cores. The computer is equipped with 16GB RAM and a GPU of AMD Radeon Pro 5300M.

Vi-A2 Training Models and Datasets

To make our experiment more comprehensive, we set up three different scenarios. Firstly, we use public dataset MNIST and CIFAR-10 as our experimental data set. The MNIST dataset of handwritten digits contains 60,000 grayscale images of the 10 digits with 50,000 training images and 10,000 test images. The CIFAR-10 dataset also consists of 60,000

colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. Secondly, We use different neural network structures which are similar to those in

[22] [1] and [19]

. i) A Convolutional Neural Network (CNN) with two

convolution layers, a fully connected layer with 512 units and ReLU activation, and a final softmax output layer. ii) A Multilayer Perceptron (MLP) with 2-hidden layers with 200 units each using ReLU activations. Thirdly, we split the dataset in IID and non-IID settings. For the IID setting, the data is shuffled and partitioned into 100 users each receiving the same size of examples. For non-IID setting, the data is sorted by labels and divided into different partitions. Then we distribute them to each client so that each client will receive a non-IID dataset.

Vi-A3 Metrics and Baselines

We use the model accuracy on the test dataset to evaluate the accuracy performance of the NISS algorithm. Meanwhile, we implement FedAVG and DP-FedAVG algorithms as baselines in our experiments.

In addition, we also use the method in [38] to detect the effect of NISS on privacy protection. we can evaluate leak-defence of a model average algorithm by determining whether the effective information can be recovered from one picture of CIFAR-100 or not. Similar to [38], we adopt the DLG loss as the metrics. The method uses randomly initialized weights and uses L-BFGS [20] to match gradients from all trainable parameters. The DLG loss is the gradient match loss for L-BFGS. The lower the DLG loss is, the more information leaks, then the final recovered image will be clearer.

Vi-B Experiment results

Vi-B1 Model Accuracy

Fig.3 shows the results on the test accuracy of training models. Since we set up three different scenarios: different dataset, IID or non-IID and different neural network, we conducted eight sets of experiments. Here for feasibility and clarity, we uses to denote which is the variance of . Then means perfect offsetting by NISS which is equal to the effect of FedAVG. means the variance of the aggregated noise on the PS side is which is the same as DP-FedAVG. Thus we use and to denote FedAVG and DP-FedAVG. From Fig.3, we can see, by tuning , the test accuracy of training model is increasing which means all clients are adding more noise and cause the variance of the aggregated noise on the PS side to increase. The higher is, the larger the variance of added noises is, and the more significant the accuracy deteriorates. This is consistent with our analysis. When the client data is IID, our NISS algorithm can increase the test accuracy by about on MNIST and on CIFAR-10 if all clients will not collude with the PS, namely, perfectly offsetting. This is because CIFAR-10 are all three-channel color picture, and the amount of noise has a higher impact on the accuracy. When the client data is non-IID, the test accuracy on MNIST increases by and for CIFAR-10 the test accuracy is higher. In addition, note that the test accuracy of CIFAR-10 is low because MLP model is too simple for training CIFAR-10 and non-IID data can cause it a low testing accuracy, this can be found in [37]. Fig.3 also shows the trade-off between model accuracy and privacy protection. If we increase , the accuracy will decrease and if we decrease , the accuracy will increase.

In summary, when , since the noise can be offset perfectly, the model accuracy given by NISS is very close to that of FedAVG on the whole and better than that of the DP-FedAVG algorithm if all the clients will not collude with the PS. And even if some clients collude with the PS, by tuning , each client can protect its privacy but the model accuaracy will decrease.

Fig. 4: DLG loss for FedAVG, DP-FedAVG and NISS. The above images show the effect of finally recovering image by DLG in [38] and the original image is a "telephone".

Vi-B2 Privacy Protection

In order to test the degree of privacy protection of the client for gradient information, we use method in [38] to test the leak-defence of FedAVG, DP-FedAVG and NISS. We use the gradients from FedAVG, DP-FedAVG and NISS to run DLG. Fig.4 shows the results of DLG loss and the image it finally recovered. The lower the DLG loss is, the more information leaked, then the final recovered image will be clearer. From Fig.4, we observe that our NISS algorithm almost does not leak any sensitive information, while DP-FedAVG may leak partial information about the privacy and FedAVG can not prevent the leakage of sensitive information totally.

In summary, the above experiments demonstrate that our NISS algorithm can achieve extraodinary performance. When clients do not collude with the PS, our NISS can achieve the same accuracy as that of FedAVG which is better than DP-FedAVG and better privacy protection due to its large scale of noise for a single client. If some clients collude with the PS, the client can set its to protect its privacy. Our experiments also show the trade-off between model accuracy and privacy protection. If clients set a higher , the accuracy will be lower and vice versa.

Vii Conclusion

In this work, we propose a novel algorithm called NISS to offset the DP noises independently generated by clients in FL systems. NISS is a method for clients to generate negatively correlated noises. Intuitively, each client splits its noise into multiple shares. Each share is negated and sent out to a neighboring client. Each client uploads its parameter plus its own noise and all negated noise shares received from other neighbors. A noise share of a particular client can be potentially offset by its negated value uploaded by another client. We theoretically prove that the NISS algorithm can effectively reduce the variance of the aggregated noise on the PS so as to improve the model accuracy in FL. Experiments with MNIST and CIFAR-10 datasets are carried out to verify our analysis and demonstrate the extraordinary performance achieved by NISS.

Appendix A Proof of theorem 3


We will calculate first. For and , we will discuss separately. Firstly, for , we have:


Secondly, for , note that here is no longer a random variable and it is a certain number which we can approximate using central limit theorem, then we can calculate it as:


According to the central limit theorem, as long as , . Thus, we have:


Then, we can obtain:


To ensure that -differentially private, it requires that:


Then we have:


Hence we can obtain . ∎


  • [1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §I, §II-B, §II-B, §III-A, §III-A, §VI-A1, §VI-A2.
  • [2] R. Agrawal and R. Srikant (2000) Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 439–450. Cited by: §II-C.
  • [3] Y. Aono, T. Hayashi, L. Wang, S. Moriai, et al. (2017) Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security 13 (5), pp. 1333–1345. Cited by: §II-C.
  • [4] L. Atzori, A. Iera, and G. Morabito (2010) The internet of things: a survey. Computer networks 54 (15), pp. 2787–2805. Cited by: §I.
  • [5] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth (2017) Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. Cited by: §II-C.
  • [6] C. Briggs, Z. Fan, P. Andras, et al. (2020) A review of privacy-preserving federated learning for the internet-of-things. Cited by: §II-B.
  • [7] W. Du, Y. S. Han, and S. Chen (2004)

    Privacy-preserving multivariate statistical analysis: linear regression and classification

    In Proceedings of the 2004 SIAM international conference on data mining, pp. 222–233. Cited by: §II-C.
  • [8] C. Dwork and A. Roth (2014-08) The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9 (3–4), pp. 211–407. External Links: ISSN 1551-305X, Link, Document Cited by: §I, §III-A, §III-A.
  • [9] R. C. Geyer, T. Klein, and M. Nabi (2017) Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §I, §II-A, §II-B, §II-B, §III-B.
  • [10] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami (2013) Internet of things (iot): a vision, architectural elements, and future directions. Future generation computer systems 29 (7), pp. 1645–1660. Cited by: §I.
  • [11] S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne (2017) Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677. Cited by: §II-C.
  • [12] B. Hitaj, G. Ateniese, and F. Perez-Cruz (2017) Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618. Cited by: §I, §II-A, §III-A.
  • [13] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977. Cited by: §I, §II-A, §II-A.
  • [14] J. Konečnỳ, H. B. McMahan, D. Ramage, and P. Richtárik (2016) Federated optimization: distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527. Cited by: §I.
  • [15] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §I.
  • [16] J. Li, S. Chu, F. Shu, J. Wu, and D. N. K. Jayakody (2018) Contract-based small-cell caching for data disseminations in ultra-dense cellular networks. IEEE Transactions on Mobile Computing 18 (5), pp. 1042–1053. Cited by: §I.
  • [17] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine 37 (3), pp. 50–60. Cited by: §I, §II-A.
  • [18] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang (2019) On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189. Cited by: §II-A.
  • [19] Y. Li, Y. Zhou, A. Jolfaei, D. Yu, G. Xu, and X. Zheng (2020) Privacy-preserving federated learning framework based on chained secure multi-party computing. IEEE Internet of Things Journal. Cited by: §VI-A2.
  • [20] D. C. Liu and J. Nocedal (1989) On the limited memory bfgs method for large scale optimization. Mathematical programming 45 (1-3), pp. 503–528. Cited by: §VI-A3.
  • [21] K. Mandal, G. Gong, and C. Liu (2018) Nike-based fast privacy-preserving highdimensional data aggregation for mobile devices. Technical report CACR Technical Report, CACR 2018-10, University of Waterloo, Canada. Cited by: §II-C.
  • [22] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §I, §II-A, §II-A, §III-B, §VI-A1, §VI-A2.
  • [23] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov (2019) Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706. Cited by: §I, §II-A, §III-A.
  • [24] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. Cited by: §VI-A1.
  • [25] S. Saroiu, P. K. Gummadi, and S. D. Gribble (2001) Measurement study of peer-to-peer file sharing systems. In Multimedia Computing and Networking 2002, Vol. 4673, pp. 156–170. Cited by: §IV-B.
  • [26] J. Vaidya and C. Clifton (2002) Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 639–644. Cited by: §II-C.
  • [27] J. Vaidya, M. Kantarcıoğlu, and C. Clifton (2008)

    Privacy-preserving naive bayes classification

    The VLDB Journal 17 (4), pp. 879–898. Cited by: §II-C.
  • [28] P. Vepakomma, T. Swedish, R. Raskar, O. Gupta, and A. Dubey (2018) No peek: a survey of private distributed deep learning. arXiv preprint arXiv:1812.03288. Cited by: §II-C.
  • [29] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. Quek, and H. V. Poor (2020) Federated learning with differential privacy: algorithms and performance analysis. IEEE Transactions on Information Forensics and Security. Cited by: §I, §II-A, §II-B, §II-B, §III-B, §VI-A1, §VI-A1.
  • [30] N. Wu, F. Farokhi, D. Smith, and M. A. Kaafar (2020) The value of collaboration in convex machine learning with differential privacy. In 2020 IEEE Symposium on Security and Privacy (SP), pp. 304–317. Cited by: §I, §II-B, §III-B.
  • [31] G. Xu, H. Li, S. Liu, K. Yang, and X. Lin (2019) Verifynet: secure and verifiable federated learning. IEEE Transactions on Information Forensics and Security 15, pp. 911–926. Cited by: §II-C.
  • [32] Q. Yang, Y. Liu, T. Chen, and Y. Tong (2019) Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 1–19. Cited by: §II-A.
  • [33] C. Zhang, P. Patras, and H. Haddadi (2019) Deep learning in mobile and wireless networking: a survey. IEEE Communications Surveys & Tutorials 21 (3), pp. 2224–2287. Cited by: §I.
  • [34] B. Zhao, K. R. Mopuri, and H. Bilen (2020) IDLG: improved deep leakage from gradients. arXiv preprint arXiv:2001.02610. Cited by: §I, §II-A, §III-A.
  • [35] Y. Zhao, J. Zhao, L. Jiang, R. Tan, D. Niyato, Z. Li, L. Lyu, and Y. Liu (2020) Privacy-preserving blockchain-based federated learning for iot devices. IEEE Internet of Things Journal. Cited by: §II-B.
  • [36] Y. Zhao, J. Zhao, M. Yang, T. Wang, N. Wang, L. Lyu, D. Niyato, and K. Y. Lam (2020) Local differential privacy based federated learning for internet of things. arXiv preprint arXiv:2004.08856. Cited by: §II-B.
  • [37] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra (2018) Federated learning with non-iid data. arXiv preprint arXiv:1806.00582. Cited by: §VI-B1.
  • [38] L. Zhu, Z. Liu, and S. Han (2019) Deep leakage from gradients. In Advances in Neural Information Processing Systems, pp. 14774–14784. Cited by: §I, §I, §II-A, §II-B, §III-A, Fig. 4, §VI-A3, §VI-B2.