I Introduction
With the remarkable development of IoT (Internet of Things) systems, IoT devices such as mobile phones, cameras and IIoT (Industrial IoT) devices have been widely deployed in our daily life [4][10]. On one hand, IoT devices with powerful computing and communication capacity are generating more and more data. On the other hand, to provide more intelligent services, decentralized IoT devices have motivation to collaborate via federated learning (FL) so that distributed data can be fully exploited for model training [33][16].
The training process via FL can be briefly described as follows. In a typical FL system, a parameter server (PS) is deployed to aggregate gradients uploaded by clients, and distribute aggregated results back to clients [13, 22, 14, 15]. The model training process terminates after exchanging the gradient information between clients and the PS for a certain number of rounds. However, it has been studied in [23, 12, 38, 34] that it can lead to the leakage of user privacy if the gradient information is disclosed. In addition, the PS is not always trustworthy [17, 13], which also possibly invades user privacy.
Recently, it has been extensively investigated by academia and industry to adopt differential privacy (DP) on each client [29, 8, 1, 30, 9] so as to protect the gradient information. DP can distort original gradients by adding additional noises, which however unavoidably distorts the aggregated gradients on the PS and hence impairs the model accuracy [29]. It has been reported in the work [29, 38] that DP noises can significantly lower model accuracy by . It implies that straightly implementing DP in real systems is impracticable when high model accuracy is required [1].
To alleviate the disturbance of DP noises on the aggregated gradients without compromising user privacy, we propose an algorithm to secretly offset DP noises. The idea of our work can be explained by the example shown in Fig. 1. There are three clients and the model to be trained by these clients is represented by the parameter for client . Each client distorts the original gradient information by adding a random number, as presented in Fig. 1
(a). We suppose that the random number is generated according to the Gaussian distribution determined by the client’s privacy budget. However, the noise can be offset if a client can negate and split its noise into multiple shares, and distribute these negated shares with other clients, as presented in Fig.
1(b). Each client uploads its gradients plus all noises (i.e., its own noises and negated noise shares from other clients) to the PS, and then these noises can be perfectly offset among themselves.Inspired by this example, we propose the Noise Information Secretly Sharing (NISS) algorithm through which clients can secretly share their noise information with each other. We theoretically prove that: 1) If clients are trustworthy, DP noises can be perfectly offset on the PS without compromising privacy protection; 2) Clients can easily distort negated noise shares received from other clients in case that other clients are not totally trustworthy. We also investigate the extreme case that the PS colludes with other clients to crack the gradient information of a particular client. In this extreme case, there is a tradeoff between model accuracy and privacy protection, and model accuracy cannot be improved without compromising privacy protection. However, we would like to emphasize that NISS is particularly applicable for FL across multiple IoT systems. IoT devices within the same system can trust each other to certain extent so that the model accuracy can be improved accordingly. Besides, devices within the same system can be connected with high speed networks so that the communication overhead caused by NISS is acceptable.
Our main contributions are summarized as below:

We propose the NISS algorithm that can secretly offset noises generated by DP adopted by each client so that the disturbance on the aggregated gradients can be removed.

We theoretically prove that the DP noises can be perfectly offset if clients are trustworthy. Even if clients are not totally trustworthy, clients can still protect themselves by distorting the negated noise shares transmitted between clients.

At last, we conduct experiments with the MNIST and CIFAR10 datasets, and the experiment results demonstrate that NISS algorithm can obtain better privacy protection and higher accuracy.
The reminder of this paper is organized as follows. In Section II, we introduce relate work on FL, DP, and SMC. In Section III, we introduce the preliminary knowledge. In Section IV, we elaborate the NISS algorithm. In Section V, we present the analysis of noise offsetting and security. In Section VI, we show the simulations, compare our scheme with other schemes and discuss the experimental results. Finally, we conclude the paper in Section VII.
Ii Related work
Iia Federated Learning (FL)
FL, as a recent advance of distributed machine learning, empowers participants to collaboratively train a model under the orchestration of a central parameter server, while keeping the training data decentralized
[13]. It was first proposed by Google in 2016 [22]. During the training process, each participant’s raw data is stored locally and will not be exchanged or transferred for training. FL has the advantages of making full use of IoT computing power with preserved user privacy.The work in [22] firstly proposed FedAVG, which is one of the most widely used model average algorithms in FL. The work in [18] analyzed the convergence rate of FedAVG with nonIID data simple distributions. The work [13] and [17] showed a comprehensive introduction to the history, technical methods and unresolved problems in FL. The work in [9] proved that the bare FedAVG can protect the privacy of participants to some extent. However only exchanging gradients information still has a high risk of privacy leakage [23, 12, 38, 34]. Despite tremendous efforts contributed by prior works, there exist many issues in FL that have not been solved very well, such as inefficient communication and device variability [17, 32, 29].
IiB Differential Privacy (DP)
DP is a very effective mechanism for privacy preservation that can be applied in FL [29, 1, 9, 30]. It uses a mechanism to generate random noises that are added to query results so as to distort original values.
The most commonly used mechanism for adding noises to FL is the Gaussian mechanism. The work in [1] investigated how to apply Gaussian mechanism in machine learning systems. Then the work in [9] studied how to use the Gaussian mechanism in FL. In [29], a FedSGD with DP algorithm is proposed for FL systems and its convergence rate is analyzed. The work in [38] introduced a novel method named DLG to measure the level of privacy preservation in FL. In FL with DP, a higher
implies a smaller variance of DP noises, and hence a lower level of privacy preservation. Model accuracy can be largely affected by DP noises
[29, 38].In the field of IoT, FL with DP has also attracted a lot of attention recently. In [6], the author surveys a wide variety of papers on privacy preserving methods that are crucial for FL in IoT. The work in [35] designed a FL system with DP leveraging the reputation mechanism to assist home appliance manufacturers to train a machine learning model based on customers’ data. The work in [36] proposed to integrate FL and DP to facilitate crowdsourcing applications to generate a machine learning model.
Basically, there is a tradeoff between the extent of privacy protection and model accuracy if DP is straightly incorporated into FL. Different from these works, we devise a novel algorithm through which clients can generate negatively correlated DP noises to get rid of the negative influence on model accuracy.
IiC Secure Multiparty Computing (SMC)
Other than DP, SMC is another effective way for privacy preservation in FL. In previous studies, SMC has been used in many machine learning models [2, 7, 26, 27]. At present, Secret Sharing (SS) and Homomorphic Encryption (HE) are two main ways in SMC to protect privacy in FL.
HE performs complicated computation operations on gradients. During the gradient aggregation and transmission, it is always calculated in an independent encryption space, instead of directly using the raw gradients value [11, 3]. SS is a method to generate several shares for a secret and send them to several participants. As long as most of participants are present, the secret can be recovered. In FL, participants can add masks to their gradients and share their masks as a secret to others. If the PS can receive returns from a sufficient number of participants, the masks can be eliminated. Several works based on SS in FL have been proposed in [5, 21, 31].
However, SS and HE consume too much computing resources, which prohibit their deployment in real world [28]. In fact, our work is a combination of SS and DP, but the computation overhead of our noise sharing scheme is very low.
Iii Preliminaries
To facilitate the understanding of our algorithm, the list of main notations used in our work is presented in Table I.
Symbol  Meaning 

The number of clients  
The index of clients  
The index of global training round  
The number of local training round  
The learning rate  
The dimension of the parameters  
The loss function 

The gradient of function  
The number of clients in each global round  
The cardinality of  
The aggregation weight of client  
The unit noise variance  
The Gaussian noise variance of client  
Gaussian Distribution  
The dataset of client  
The client set of client in round t  
The global model parameters  
The local model parameters of client  
The noise generated by DP mechanism  
The negated noise  
A random variables to distort 

The variance of  
The identity matrix  
DP parameters 
Iiia Differential Privacy
It was assumed that user privacy will not be leaked if only gradient information is disclosed. However, it was investigated in [23, 12, 38, 34] that privacy information can be reconstructed through gradient information. Therefore, it was proposed in [1] that clients can adopt DP to further disturb their gradient information by adding additional noises to their disclosed information. According to the prior work [8], an algorithm satisfying differential privacy is defined as follows.
Definition 1.
A randomized mechanism with domain and range satisfies differentially privacy if for any two adjacent databases and for any subset of outputs ,
(1) 
Here, is the privacy budget which is the distinguishable bound of all outputs on adjacent databases and .
represents the probabilities that two adjacent outputs of the databases
cannot be bounded by after using Algorithm . is also called the privacy budget. Intuitively, a DP mechanism with a smaller privacy budget has a stronger privacy protection and vice verse.Theorem 1.
(Gaussian Mechanism). Let be arbitrary and denote the database. For , the Gaussian Mechanism with parameter is differentially private. Here, represents the original output and is the sensitivity of given by .
For detailed proof, please refer to the reference [8].
We assume that the Gaussian mechanism is adopted in our work because it is convenient to split DP noises obeying the Gaussian distribution into multiple shares [1].
IiiB DPFedAVG
FedAVG is the most commonly used model average algorithm in FL, and thereby FedAVG is used for our study. Based on previous works [22, 30, 9, 29], we present the client based DPFedAVG here to ease our following discussion.
Without loss of generality, we assume that there are clients. The client owns a private dataset with cardinality
. These clients target to train a model with parameters represented by the vector
. In FedAVG, clients need to exchange model parameters for multiple rounds with the PS. Each round is also called a global iteration. At the beginning of global round , each participating client receives the global parameters from the PS to conduct a number of local iterations. Then, clients return their locally updated model parameters plus DP noises to the PS. By receiving the computation results from a certain number of clients, the PS aggregates received parameters and embarks a new round of global iteration. The detail of the DPFedAVG algorithm is presented in Algorithm 1.In Algorithm 1, is the fraction of clients that participate each global iteration, is the Gaussian noise and is the aggregation weight of client , is the set of clients that participate in round . Usually, . is the set of local sample batches, is the number of local iterations to be conducted and is the learning rate.
Let represent the function returning the locally updated parameters with input and . The sensitivity of is denoted by . We assume that the privacy budget of client is represented by and .
Corollary 1.
Algorithm 1 satisfies differentially private, if is sampled from where , and is the identity matrix.
Here is the model dimension. The proof is straightforward from Theorem 1.
According to Algorithm 1, the disturbance of the DP noises on the aggregated parameters is
(2) 
From the right hand side of Eq.(2), we can see that the first term represents the aggregated parameters while the second term represents the disturbance of the DP noises. They are independently generated by all participating clients, and therefore the variance of is . Apparently, if the privacy budget is smaller, is higher and the total variance on the server side is higher. Our approach is to make these noises negatively correlated so that the aggregated noise variance can be reduced.
Iv NISS Algorithm
In this section, we introduce the NISS algorithm in details and leave the analysis of the reduced variance on the aggregation and the security analysis of NISS in the next section.
Iva Illustrative Example
Before diving into the detailed NISS algorithm, we present a concrete example to illustrate how NISS works. According to Algorithm 1, is sampled from since the dimension of is . It means that noises of dimensions are generated independently and the noise offset is conducted for each dimension independently. Thus, to simplify our discussion, we only need to consider the noise for a particular dimension of client , and is sampled from .
According to the property of the Gaussian distribution, can be split into shares and each share is sampled from . The client can send out negated share to neighboring clients. If all clients conduct the same operation, the client is expected to receive noise shares from other clients as well, which can be denoted as . To ease our understanding, the process is illustrated in Fig.2.
Then, the client adds both its own noise and the sum of negated noise shares received from other clients to its parameter before it submits to the PS. The can preserve the privacy of client while can be used to offset the noises generated by other clients by the PS. Since negated shares are generated randomly, no other client can exactly obtain the noise information of client . In addition, the parameter information is only disclosed to the PS. As long as all other clients are trustworthy, these DP noises can be offset perfectly by negated noise shares.
IvB Algorithm Design
We proceed to design the NISS algorithm based on the FedAVG algorithm introduced in the last section.
First of all, a tracker server is needed so that clients can send and receive negated noise shares with each other. Each client needs to contact the tracker server to fetch a list of neighbor clients before it sends out negated noise shares. The tracker server is only responsible for recording live clients in the system and returning a random list of clients as neighbors for a particular client . Obviously, the tracker server does not receive any noise information, and hence will not intrude user privacy. It can be implemented with light communication cost, similar to the deployment of the tracker server in peertopeer file sharing systems [25].
In NISS, the operation of the PS is the same as that in FedAVG. The only difference lies in the operation of each client. Based on its own privacy budget and function sensitivity, the client needs to determine so that satisfies differentially privacy. Then, the client can determine the number of noise shares according to so that the client can generate noise shares and negated noise shares. Here is a number much smaller than and can be a common value used by all clients. is also called a unit noise.
Because clients disclose their noise information with other clients, the gradient information can be cracked to certain extent if some clients are not trustworthy and collude with the PS to intrude the privacy of a particular client. To prevent the leakage of user privacy, we propose to multiply a noise component to the received negated noise share . is also sampled from the Gaussian distribution . Due to the disturbance of , no other client and the PS can exactly crack the gradient information of the client. can be set according to the probability that other clients will collude with the PS. How to exactly set and the role of will be further analyzed in the next section.
By wrapping up, the details of the NISS algorithm is presented in Algorithm 2.
V Theoretic Analysis
In this section, we conduct analysis to show how much noise variance can be reduced by NISS on the PS side and how the NISS algorithm defends against attacks. Based on our analysis, we also discuss the application of NISS in real FL systems.
Va Analysis of Noise Offsetting
Similar to Sec.IVA, to simplify our discussion, we only consider the noise offsetting for a particular dimension. Let and denote noise shares and negated noise shares received from other clients for client respectively. Let denote the client that receive the th negated noise share from client .
Based on Algorithm 2, the client uploads . The aggregation conducted on the PS becomes
Here is sampled from and is sampled from . is the abbreviation of if its meaning is clear from the context. Let denote the aggregated DP noises and our study focuses on the minimization of .
Let us first analyze the variance of a particular noise share after offsetting.
Lemma 1.
The variance of a noise share plus its negated share is:
(3) 
Here client receives the negated share of and is the noise imposed by client .
Proof.
According to the definition of the variance, we can obtain:
(4) 
The above formula holds because is sampled from and is sampled from . So we can obtain and similarly. Apparently, and are dependent according to our algorithm. ∎
Theorem 2.
After noise offsetting, the variance of the aggregated noise on the PS side is:
(5) 
Remark: From Theorem 2, we can observe that if implying that DP noises are perfectly offset on the PS side. However, if , the value of is the same as that without any noise offsetting. The value of depends on the trustworthy between clients. We will further discuss how to set after the security analysis in the next subsection.
VB Security Analysis
We conduct the security analysis through analyzing the privacy preservation for a particular client. We suppose that the target of a particular client is to satisfy the differentially private.
It is easy to understand that the NISS algorithm satisfies the differentially private by setting or , if the PS and clients do not collude. What client submits to the PS is . The noise is also a Gaussian random variable with variance , and hence the NISS algorithm on client satisfies differentially private. Meanwhile, no other client can crack the parameter information since the parameter information is only disclosed to the PS.
However, it is not guaranteed that the PS never colludes with clients. To conduct more general analysis, we assume that there is fraction of other clients will collude with the PS. The problem is how to set so that the NISS algorithm on client can still satisfies differentially private.
Let represent the set of clients that client will contact. There is no prior knowledge about which client will collude with the PS. The tracker server randomly select clients for . It implies that fraction of will disclose the noise share information with the PS. We use to denote the clients who collude with the PS and to denote the clients who do not collude. Apparently, the size of and are and . Thus, the effective noise uploaded by client becomes . To ensure that differentially private can be satisfied, it requires . It turns out that
Theorem 3.
If sampled from can make satisfy differentially private, the NISS algorithm satisfies differentially private as long as . Here represents the percentage of other clients that collude with the PS.
The detailed proof is presented in Appendix A.
Remark: It is worth to mention the special case with . According to Theorem 3, and is sampled from if . In this case,
. According to the central limit theorem, as long as
, we have . Thus, if and , it implies that and . The variance of the aggregated noise on the PS is the same as that without any offsetting operation. In this extreme case, there exists a tradeoff between model accuracy and privacy protection. One cannot improve the model accuracy without compromising privacy protection.VC Application of NISS in Practice
As we have discussed in the last section, is a vital parameter. Our analysis uses to cover all cases with different fractions of malicious clients colluding with the PS. If is close to , it will significantly impair the performance of the NISS algorithm. In practice, can be set as a small value, which can be illustrated from two perspective.
Firstly, most FL systems are of a largescale with tens of thousands of clients. If there are more normal clients, the fraction of malicious clients that will collude with the PS will be a smaller value. Secondly, our analysis is based on the assumption that is randomly selected by the tracker server. In fact, clients can play coalitional game with other clients they trust. For instance, the IoT devices of the same system can trust each other substantially. They can share negated noise information with each other by setting a small since the probability that neighboring clients collude with the PS is very low. From this example, we can also conclude that the NISS algorithm is particularly applicable for FL across multiple IoT systems. IoT devices in the sample system can form a coalition so that the variance of the aggregated noise is minimized. Besides, devices within the same system can be connected with high speed networks so that the communication overhead to transmit noise shares is insignificant.
Vi Experiment
In this section, we conduct experiments with MNIST and CIFAR10 to evaluate the performance of NISS.
Via Experimental Setup
ViA1 Simulation Settings
Based on [1], we use the Gaussian mechanism to add noises to local model parameters. We use the same experimental settings as in [22], [29] and [1]. The FL settings of our experiment are as follows:

Number of users:

User fraction rate:

Local minibatch size:

Learning rate:

Number of local round:

Unit noise variance:

DP parameters:
In addition, to achieve DPFedAVG, we use the norm clipping technique with a clipping threshold to restrict the range of client’s gradients. If a client’s some gradient exceeds , it will be clipped to . The details of its mechanism can be found in [29]. For our experiments, we set .
ViA2 Training Models and Datasets
To make our experiment more comprehensive, we set up three different scenarios. Firstly, we use public dataset MNIST and CIFAR10 as our experimental data set. The MNIST dataset of handwritten digits contains 60,000 grayscale images of the 10 digits with 50,000 training images and 10,000 test images. The CIFAR10 dataset also consists of 60,000
colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. Secondly, We use different neural network structures which are similar to those in
[22] [1] and [19]. i) A Convolutional Neural Network (CNN) with two
convolution layers, a fully connected layer with 512 units and ReLU activation, and a final softmax output layer. ii) A Multilayer Perceptron (MLP) with 2hidden layers with 200 units each using ReLU activations. Thirdly, we split the dataset in IID and nonIID settings. For the IID setting, the data is shuffled and partitioned into 100 users each receiving the same size of examples. For nonIID setting, the data is sorted by labels and divided into different partitions. Then we distribute them to each client so that each client will receive a nonIID dataset.
ViA3 Metrics and Baselines
We use the model accuracy on the test dataset to evaluate the accuracy performance of the NISS algorithm. Meanwhile, we implement FedAVG and DPFedAVG algorithms as baselines in our experiments.
In addition, we also use the method in [38] to detect the effect of NISS on privacy protection. we can evaluate leakdefence of a model average algorithm by determining whether the effective information can be recovered from one picture of CIFAR100 or not. Similar to [38], we adopt the DLG loss as the metrics. The method uses randomly initialized weights and uses LBFGS [20] to match gradients from all trainable parameters. The DLG loss is the gradient match loss for LBFGS. The lower the DLG loss is, the more information leaks, then the final recovered image will be clearer.
ViB Experiment results
ViB1 Model Accuracy
Fig.3 shows the results on the test accuracy of training models. Since we set up three different scenarios: different dataset, IID or nonIID and different neural network, we conducted eight sets of experiments. Here for feasibility and clarity, we uses to denote which is the variance of . Then means perfect offsetting by NISS which is equal to the effect of FedAVG. means the variance of the aggregated noise on the PS side is which is the same as DPFedAVG. Thus we use and to denote FedAVG and DPFedAVG. From Fig.3, we can see, by tuning , the test accuracy of training model is increasing which means all clients are adding more noise and cause the variance of the aggregated noise on the PS side to increase. The higher is, the larger the variance of added noises is, and the more significant the accuracy deteriorates. This is consistent with our analysis. When the client data is IID, our NISS algorithm can increase the test accuracy by about on MNIST and on CIFAR10 if all clients will not collude with the PS, namely, perfectly offsetting. This is because CIFAR10 are all threechannel color picture, and the amount of noise has a higher impact on the accuracy. When the client data is nonIID, the test accuracy on MNIST increases by and for CIFAR10 the test accuracy is higher. In addition, note that the test accuracy of CIFAR10 is low because MLP model is too simple for training CIFAR10 and nonIID data can cause it a low testing accuracy, this can be found in [37]. Fig.3 also shows the tradeoff between model accuracy and privacy protection. If we increase , the accuracy will decrease and if we decrease , the accuracy will increase.
In summary, when , since the noise can be offset perfectly, the model accuracy given by NISS is very close to that of FedAVG on the whole and better than that of the DPFedAVG algorithm if all the clients will not collude with the PS. And even if some clients collude with the PS, by tuning , each client can protect its privacy but the model accuaracy will decrease.
ViB2 Privacy Protection
In order to test the degree of privacy protection of the client for gradient information, we use method in [38] to test the leakdefence of FedAVG, DPFedAVG and NISS. We use the gradients from FedAVG, DPFedAVG and NISS to run DLG. Fig.4 shows the results of DLG loss and the image it finally recovered. The lower the DLG loss is, the more information leaked, then the final recovered image will be clearer. From Fig.4, we observe that our NISS algorithm almost does not leak any sensitive information, while DPFedAVG may leak partial information about the privacy and FedAVG can not prevent the leakage of sensitive information totally.
In summary, the above experiments demonstrate that our NISS algorithm can achieve extraodinary performance. When clients do not collude with the PS, our NISS can achieve the same accuracy as that of FedAVG which is better than DPFedAVG and better privacy protection due to its large scale of noise for a single client. If some clients collude with the PS, the client can set its to protect its privacy. Our experiments also show the tradeoff between model accuracy and privacy protection. If clients set a higher , the accuracy will be lower and vice versa.
Vii Conclusion
In this work, we propose a novel algorithm called NISS to offset the DP noises independently generated by clients in FL systems. NISS is a method for clients to generate negatively correlated noises. Intuitively, each client splits its noise into multiple shares. Each share is negated and sent out to a neighboring client. Each client uploads its parameter plus its own noise and all negated noise shares received from other neighbors. A noise share of a particular client can be potentially offset by its negated value uploaded by another client. We theoretically prove that the NISS algorithm can effectively reduce the variance of the aggregated noise on the PS so as to improve the model accuracy in FL. Experiments with MNIST and CIFAR10 datasets are carried out to verify our analysis and demonstrate the extraordinary performance achieved by NISS.
Appendix A Proof of theorem 3
Proof.
We will calculate first. For and , we will discuss separately. Firstly, for , we have:
(8) 
Secondly, for , note that here is no longer a random variable and it is a certain number which we can approximate using central limit theorem, then we can calculate it as:
(9) 
According to the central limit theorem, as long as , . Thus, we have:
(10) 
Then, we can obtain:
(11) 
To ensure that differentially private, it requires that:
(12) 
Then we have:
(13) 
Hence we can obtain . ∎
References
 [1] (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §I, §IIB, §IIB, §IIIA, §IIIA, §VIA1, §VIA2.
 [2] (2000) Privacypreserving data mining. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 439–450. Cited by: §IIC.
 [3] (2017) Privacypreserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security 13 (5), pp. 1333–1345. Cited by: §IIC.
 [4] (2010) The internet of things: a survey. Computer networks 54 (15), pp. 2787–2805. Cited by: §I.
 [5] (2017) Practical secure aggregation for privacypreserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. Cited by: §IIC.
 [6] (2020) A review of privacypreserving federated learning for the internetofthings. Cited by: §IIB.

[7]
(2004)
Privacypreserving multivariate statistical analysis: linear regression and classification
. In Proceedings of the 2004 SIAM international conference on data mining, pp. 222–233. Cited by: §IIC.  [8] (201408) The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9 (3–4), pp. 211–407. External Links: ISSN 1551305X, Link, Document Cited by: §I, §IIIA, §IIIA.
 [9] (2017) Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §I, §IIA, §IIB, §IIB, §IIIB.
 [10] (2013) Internet of things (iot): a vision, architectural elements, and future directions. Future generation computer systems 29 (7), pp. 1645–1660. Cited by: §I.
 [11] (2017) Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677. Cited by: §IIC.
 [12] (2017) Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618. Cited by: §I, §IIA, §IIIA.
 [13] (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977. Cited by: §I, §IIA, §IIA.
 [14] (2016) Federated optimization: distributed machine learning for ondevice intelligence. arXiv preprint arXiv:1610.02527. Cited by: §I.
 [15] (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §I.
 [16] (2018) Contractbased smallcell caching for data disseminations in ultradense cellular networks. IEEE Transactions on Mobile Computing 18 (5), pp. 1042–1053. Cited by: §I.
 [17] (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine 37 (3), pp. 50–60. Cited by: §I, §IIA.
 [18] (2019) On the convergence of fedavg on noniid data. arXiv preprint arXiv:1907.02189. Cited by: §IIA.
 [19] (2020) Privacypreserving federated learning framework based on chained secure multiparty computing. IEEE Internet of Things Journal. Cited by: §VIA2.
 [20] (1989) On the limited memory bfgs method for large scale optimization. Mathematical programming 45 (13), pp. 503–528. Cited by: §VIA3.
 [21] (2018) Nikebased fast privacypreserving highdimensional data aggregation for mobile devices. Technical report CACR Technical Report, CACR 201810, University of Waterloo, Canada. Cited by: §IIC.
 [22] (2017) Communicationefficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §I, §IIA, §IIA, §IIIB, §VIA1, §VIA2.
 [23] (2019) Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706. Cited by: §I, §IIA, §IIIA.
 [24] (2017) Automatic differentiation in pytorch. Cited by: §VIA1.
 [25] (2001) Measurement study of peertopeer file sharing systems. In Multimedia Computing and Networking 2002, Vol. 4673, pp. 156–170. Cited by: §IVB.
 [26] (2002) Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 639–644. Cited by: §IIC.

[27]
(2008)
Privacypreserving naive bayes classification
. The VLDB Journal 17 (4), pp. 879–898. Cited by: §IIC.  [28] (2018) No peek: a survey of private distributed deep learning. arXiv preprint arXiv:1812.03288. Cited by: §IIC.
 [29] (2020) Federated learning with differential privacy: algorithms and performance analysis. IEEE Transactions on Information Forensics and Security. Cited by: §I, §IIA, §IIB, §IIB, §IIIB, §VIA1, §VIA1.
 [30] (2020) The value of collaboration in convex machine learning with differential privacy. In 2020 IEEE Symposium on Security and Privacy (SP), pp. 304–317. Cited by: §I, §IIB, §IIIB.
 [31] (2019) Verifynet: secure and verifiable federated learning. IEEE Transactions on Information Forensics and Security 15, pp. 911–926. Cited by: §IIC.
 [32] (2019) Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 1–19. Cited by: §IIA.
 [33] (2019) Deep learning in mobile and wireless networking: a survey. IEEE Communications Surveys & Tutorials 21 (3), pp. 2224–2287. Cited by: §I.
 [34] (2020) IDLG: improved deep leakage from gradients. arXiv preprint arXiv:2001.02610. Cited by: §I, §IIA, §IIIA.
 [35] (2020) Privacypreserving blockchainbased federated learning for iot devices. IEEE Internet of Things Journal. Cited by: §IIB.
 [36] (2020) Local differential privacy based federated learning for internet of things. arXiv preprint arXiv:2004.08856. Cited by: §IIB.
 [37] (2018) Federated learning with noniid data. arXiv preprint arXiv:1806.00582. Cited by: §VIB1.
 [38] (2019) Deep leakage from gradients. In Advances in Neural Information Processing Systems, pp. 14774–14784. Cited by: §I, §I, §IIA, §IIB, §IIIA, Fig. 4, §VIA3, §VIB2.