Many attractive applications involve training models using highly sensitive data, to name a few, diagnosis of diseases with medical records, or genetic sequences Alipanahi et al. (2015). In order to protect the privacy of the training data, various privacy protection works have been proposed in the literature Ma et al. (2018); Michie et al. (1994); Nissim et al. (2007); Samangouei et al. (2018). The federated learning framework is of particular interest since it can provide a well-trained model without touching any sensitive data directly Konečnỳ et al. (2016); McMahan et al. (2016). The original purpose of the federated learning framework is to share the weights of the model trained on sensitive data instead of data directly. However, some studies show that the weights also can leak privacy and recover the original sensitive data Papernot et al. (2017). In order to solve the problem, recent works start to use the differential privacy to protect the private data with federated learning Bhowmick et al. (2018); Geyer et al. (2017); Nguyên et al. (2016), but most of them can not give a practical solution for deep learning model on complex dataset due to the trade-off between privacy budget and performance.
In previous approaches, there are several apparent concerns and challenges. First, the noisy data is close to its original value with high probability, increasing the risk of information exposure. For example, the price of a pen is 5. After data perturbation, the protected information is close to or around 5, such as 5.3. Second, a large variance is introduced to the estimated average, causing poor accuracy. As a result, it requires more communication between the cloud and clients for the learning converge. Third, the privacy budget explodes due to the high dimensionality of weights in deep learning models. Last, to the best of our knowledge, no existing works have shown an excellent deep learning performance on popular datasets such as MNIST LeCun et al. (2010), Fashion-MNIST Xiao et al. (2017) and CIFAR-10 Krizhevsky et al. (2009), with a reasonably small privacy budget.
In this paper, we proposed Locally Differential Private Federated Learning (LDP-FL), a new local differential privacy (LDP) mechanism to solve the above issues. As shown in Fig. 1, novelly designed local data perturbation and local model weights splitting and shuffling are applied to a typical federated learning system.
Our main contribution is multifold. First, we propose a new LDP mechanism, and show how to apply it to federated learning. We prove our LDP mechanism is securer than existing mechanisms with a much reduced risk of information exposure by making the perturbed data more distinct from its original value. Second, we apply splitting and shuffling to each clients’ gradients to mitigate the privacy degradation caused by high data dimension and many query iterations. Third, we prove that our solution introduces less variance to average calculation, making it capable of presenting a better model training performance. Moreover, we show that the traditional mechanisms used for centralized differential privacy in which a trusted cloud is required could also be used for LDP, though they are less secure than our mechanism. Last, we evaluate LDP-FL on three commonly used dataset in previous works, MNIST LeCun et al. (2010), Fashion-MNIST Xiao et al. (2017) and CIFAR-10 Krizhevsky et al. (2009). The proposed mechanism achieves a privacy budget of with 0.97% accuracy loss on MNIST, a privacy budget of with 1.32% accuracy loss on Fashion-MNIST, and a privacy budget of with 1.09% accuracy loss on CIFAR-10, respectively.
Due to limited space, all proofs and additional experiments are given in the appendix.
2.1 Local Differential Privacy
Formally, the definition of -LDP is given as below:
Dwork (2011) A randomized mechanism satisfies -LDP, for any pair input and in , and any output of ,
where the inputs and are any two inputs. The privacy guarantee of mechanism is controlled by privacy budget Dwork (2011), denoted as . A smaller value of indicates a stronger privacy guarantee. The immunity to post-processing Dwork and Roth (2014) also works on LDP, which claims no algorithm can compromise the differentially private output and make it less differentially private.
2.2 Federated Learning
Federated learning McMahan et al. (2016); Konečnỳ et al. (2016) has been proposed and widely used in different approaches. The motivation is to share the model weights instead of the private data for better privacy protection. Each client, the owner of private training data, updates a model locally, and sends all gradients or weights information to the cloud. The cloud aggregates such information from clients, updates a new central model (e.g., averaging all clients’ weights), and then distributes it back to a fraction of clients for another round of model update. Such process is continued iteratively until a satisfying performance is achieved. Note that, to minimize communication, each client might take several mini-batch gradient descent steps in local model computation. To enchance the privacy protection, differential privacy has been applied to federated learning Bhowmick et al. (2018); Geyer et al. (2017). Previous works mostly focus on either the differential privacy mechanism that requires a central trusted party or the theoretical analysis of LDP techniques. Practical solutions for deep learning tasks data and tasks have not been well studied.
3 Our Approach
In the framework of LDP-FL as shown in Fig.1, the cloud is an aggregator that collects a set of weights of local client models from the local side and averages the weights after each communication round. The goal is to maximize the accuracy of both cloud and local client models while preserving users’ privacy. In our proposed algorithm, we will alter and approximate each local information with a randomized mechanism. This is done to totally hide a single client’s contribution within the aggregation and thus within the entire decentralized learning procedure.
3.1 Overview of Ldp-Fl
In this section, we introduce the federated learning approach with LDP consists of two steps, as shown in Algorithm 1.
Cloud Update. First, the cloud initializes the weights randomly at the beginning. Let be the total number of local clients. Then, in the -th communication round, the cloud will randomly select clients to update their weights for local-side optimization. Unlike the previous works Bhowmick et al. (2018); Erlingsson et al. (2014); Seif et al. (2020), where they assume that the aggregator already knows the identities (e.g., IP addresses) of the users but not their private data, our approach assumes the client remains anonymous to the cloud. For example, the client can leverage a changing IP address or the same IP address for all clients to send the local weights back to the cloud. This approach can provide us more robust privacy bound and practical solution and more details in section 3.2.
For each client, it contains its own private dataset. In each communication, the selected local clients will update their local model by the weight from the cloud. Next, the local model uses Stochastic Gradient Descent (SGD) to optimize the distinct local models’ weights in parallel. In order to provide a practical privacy protection approach, we split and shuffle each local model’s weights and send each weight through an anonymous mechanism to the cloud. In this case, we can provide more reliable privacy protection and give a practical solution with available results in the final.
3.2 Privacy-Preserving Mechanism
Data Perturbation. Given the weights of a model, the algorithm returns a perturbed tuple by randomizing each dimension of . Let be our mechanism, for each weight/entry , assuming where is the center of ’s range and is the radius of the range ( depend on how we clip the weight). Our LDP mechanism changes to with two intuitions. First, can only be one of two discrete values: one larger, the other smaller, which makes it easy to prove LDP. Then, to ensure the noise has zero mean, i.e., , we set to the larger value with a high probability if itself is large, and set to the smaller value with a high probability if itself is small. Therefore, we design the following LDP mechanism to perturb .
where is the reported noisy weight by our proposed LDP. Algorithm 2 shows the pseudo-code of this mechanism.
In previous mechanisms Abadi et al. (2016); Papernot et al. (2017), the generated noisy data is close to its original value with high probability, still revealing information, whereas our approach makes the noisy data very distinct from its original value. For example, while and , the perturbed information only could be either or for any the original data.
Splitting & Shuffling. The shuffling mechanism includes two parts: splitting and shuffling. The primary purpose is to give stronger privacy protection while using LDP on federated learning. Compared to traditional machine learning, deep learning needs more communications and weights for better performance, causing more privacy leakage during traditional federated learning. So, splitting and shuffling can help to save much privacy budget during the training phase.
Splitting breaks the private connections of each local client model’s weights, and shuffling breaks the privacy of communications between the local clients and cloud. For a better shuffling approach, each client also samples a random time for sending each weight to the cloud in Algorithm 3. Fig.2 shows the idea of splitting and shuffling of the weights of each local client model that contains two main steps. For local model , each model has the same structure but with different weight values. Original federated learning is to send the models’ information to the cloud, as shown in Fig.2 (1). In our approach, first, for each model, we split the weights of each local model. Then, for each weight, we shuffle them though the client anonymity mechanism and send each weight with its id to the cloud, where id indicates the location of the weight value in the network structure. Finally, splitting and shuffling can ensure the cloud collects the right weight values to update the central model without any knowledge about the relation of each weight and local clients.
In practice, client anonymity can be achieved via faking source IP, VPN, proxy, Tor, secure multi-party computation, etc Erlingsson et al. (2019). Through splitting and shuffling, the cloud does not know the received information from which local client model. Privacy could be better protected.
4 Privacy and Utility Analysis
In this section, we analyze the privacy and utility of our approach, compare it with other LDP mechanisms, and compare DP and LDP.
4.1 Local Differential Privacy
First, we prove the proposed noise addition mechanism satisfies LDP.
Given any single number , where is the center of the ’s range and is the radius of the range, the proposed mechanism in Eq. 2 satisfies -LDP.
4.2 Client Anonymity
As we claimed before, LDP-FL uses splitting and shuffling to bypass the curse of dimensionality. Since the weights are split and uploaded anonymously, the cloud is unable to link different weight values from the same client, so it cannot infer more information about a particular client. Therefore, it is sufficient to protect -LDP for each weight. Likewise, because of the anonymity, the cloud is unable to link weights from the same client at different iterations. Without splitting and shuffling, the privacy budget of LDP will grow to , where is the iteration number and is the number of weights in the model. Similar discussion can be found in Erlingsson et al. (2014, 2019). Unlike our approach, the previous works shuffle the ordered sequential information.
4.3 Bias and Variance
Suppose the true average is for each , the proposed LDP mechanism in Eq. 2 induces zero bias in calculating the average weight .  Algorithm 2 introduces zero bias to estimating average weights, i.e., .
Next, the proposed mechanism leads to a small variance to each reported weight .  Let be the proposed data perturbed mechanism. Given any number , the variance of the mechanism is For the variance of the estimated average weight , we have the lower bound and upper bound as below.  Let be the estimated average weight. The lower bound and upper bound of the estimated average weight is: The variance is large when is large, is small, or is small. By Lemma 4.3, we conclude the following theorem that shows the accuracy guarantee of the proposed LDP mechanism.
For any weight of any local model, with at least probability,
4.4 Comparison with Other Mechanisms
First, existing mechanisms generate noisy data that is close to its original value w.h.p., revealing the original value’s confidence interval. For instance, Laplace and Gaussian mechanisms add small noise w.h.p. and randomized response based mechanisms keep the data unchanged w.h.p. On the contrary, the proposed mechanism chooses one out of two extreme values as the noisy data, making it more distinct from its original value. Then, we compare with the following popular mechanisms specifically.
Randomized Response Mechanism. Google’s Rappor Erlingsson et al. (2014) and generalized randomized response Wang et al. (2017) are for binary or categorical data only, whereas data are numeric in the scenario of federated learning. A modified version of generalized randomized response mechanism Bhowmick et al. (2018) was proposed, but it introduces asymptotically higher variance to the estimated average than ours and is only feasible when is very large.
ScalarDP Bhowmick et al. (2018). ScalarDP introduces a variance to the noisy data (Lemma 4.4 in Bhowmick et al. (2018)), whereas our variance bound is tighter (Lemma 4.3). Besides, this work Bhowmick et al. (2018) assumes the attacker has little prior knowledge about user data and is large.
Laplace Mechanism. If we apply the Laplace mechanism to each client’s data , the variance is . The variance of estimated average over clients is . In our best case (Equation 8), the Laplace mechanism’s variance is always higher than ours for any . We also compare with Laplace in terms of absolute data perturbation to prove our mechanism is securer in the appendix. For instance, when , our data perturbation is greater than Laplace’s with probability 0.79 in the best case. Therefore, we claim our mechanism makes noisy data more distinct from its original value than Laplace mechanism.
Gaussian Mechanism. Currently, most works on -LDP do not relax its definition with . However, the Gaussian mechanism introduces Dwork et al. (2014), making it weaker than LDP Duchi et al. (2014). is the probability that indicates those highly unlikely “bad” events. These “bad” events may break -differential privacy and usually defined by the size of the dataset in previous works Papernot2017; Wang et al. (2018). As a result, the Gaussian Mechanism is less secure than our mechanism (without introducing ).
4.5 Discussion: DP vs LDP
There is a blurry line between DP and LDP. In general, they differ in definition and in whether the noise is added locally or on the server. However, they are not mutually exclusive. The definition of LDP is a special case of that of DP. Also, sometimes noises are added locally to achieve DP instead of LDP, e.g., via additive secret sharing.
We can prove that any mechanism for DP can also be used to achieve LDP.
Let be any mechanism satisfying -differential privacy. Applying to each value of a dataset containing values satisfies -LDP.
In this section, image classification tasks and a real mobile application are used as experimental examples to evaluate the effectiveness of LDP-FL. We first examine the effect of different weights based on the image benchmark datasets, MNIST LeCun et al. (1998), Fashion-MNIST Xiao et al. (2017) and then verify the performance improvement based on CIFAR-10 Krizhevsky and Hinton (2009) and the preceding three datasets.
To verify the performance improvement brought by LDP-FL
, we need to mainly evaluate the performance with the number of the client. For MNIST and Fashion-MNIST, we implement a two-layer CNN for image classification. However, for CIFAR-10, the default network from Pytorch library only achieves around 50% accuracy without data perturbation, and we re-design a small VGGSimonyan and Zisserman (2014) for the task. The training data and the testing data are fed into the network directly in each client, and for each client, the size of the training data is the total number of the training samples divides the number of the clients. In this case, a larger number of clients implicits the small size of the training data of each client. For each weight, we clip them in a fixed range. In this work, we set and for MNIST and Fashion-MNIST. For CIFAR-10, due to the complexity of the network, we set and is set by the weight range of each layer. The learning rate is set as 0.03 for MNIST and 0.015 for CIFAR-10. Considering the randomness during perturbation, we run the test experiments ten times independently to obtain an averaged value.
The proposed models are implemented using Pytorch, and all experiments are done with a single GPU NVIDIA GTX Titan on the local server. MNSIT and Fashion-MNIST can be within an hour with 10 communication rounds, and CIFAR-10 needs about 2 hours with 15 communication rounds.
5.1 Performance Analysis and Comparison
Fig. 3 shows that LDP-FL can achieve performance with a low privacy cost because of the new design of the communication and the new local data perturbation. It is not hard to see that while increasing the number of clients in training, the LDP-FL can perform as close as the noise-free federated learning. Compared with MNIST (), CIFAR-10 (
) needs more clients, which indicates that for a more complex task with a larger neural network model, it requires more local data and more clients for a better performance against the data perturbation. The privacy budget also affects the performance of the central model, but more details will be discussed in the next section.
Some existing works also want to solve the privacy concern on similar tasks. Geyer et al. Geyer et al. (2017) first apply DP on federated learning. While they use clients, they can only achieve 78% accuracy with communication rounds. Bhowmick et al.Bhowmick et al. (2018) first utilize the LDP in federated learning. Due to the high variance of their mechanism, it requires more than 200 communication rounds and spends much more privacy budgets, i.e., MNIST () and CIFAR-10 (). Last, the most recent work Truex et al. (2020) utilizes Condensed Local Differential Privacy (-CLDP) into federated learning, with 86.93% accuracy on Fashion-MNIST dataset. However, -CLDP achieved that performance by requiring a relatively privacy budget (e.g. ), which results in a weak privacy guarantee. Compared to existing works, our approach can first use much fewer communication rounds between clients and cloud (i.e., 10 for MNIST, 15 for Fashion-MNIST and CIFAR-10), which makes the whole solution more practical in real life. Second, we can achieve 96.24% accuracy with and , 86.26% accuracy with and 61.46% accuracy with on MNIST, Fashion-MNIST and CIFAR-10, respectively. Overall, LDP-FL achieves better performance on both effectiveness and efficiency than relevant works.
5.2 Analysis of Privacy Budget
The privacy budget represents the privacy cost in the framework. To analyze the impact of privacy budgets on performance, we choose the scale from 0.1 to 1 for MNIST and 1 to 10 for CIFAR-10 in Table 1. It is obvious that more complex data and tasks require more privacy cost. The main reason is that the complex task requires a sophisticated neural network, which contains a large number of model weights. Meanwhile, the range of each weight is also wider in the complex task.
Fig. 3 shows that LDP-FL can maintain the accuracy at a high value for a wide range of privacy budgets on MNIST (), Fashion-MNIST () and CIFAR-10 (). While for MNIST and , the accuracy decreases by 90% (), 15% () and 50% () for MNIST, Fashion-MNIST and CIFAR-10, respectively. The accuracy almost keeps unchanged until decreases to 0.1 for MNIST and 1 for CIFAR-10, respectively, because the performance drops to 10%, which is the same as a random guess in these two datasets. Fashion-MNIST is between MNIST and CIFAR-10, and it can get an excellent performance after . These results argue that LDP-FL applies to different privacy requirements. It can effectively improve the performance even when the privacy budget is relatively tight than previous works and other mechanisms. In addition, we show the accuracy of LDP-FLwhen the number of clients is increasing. It is not hard to see that more clients can afford more noise due to the privacy analysis we show in the last section.
6 Related Work
Differential privacy Dwork et al. (2014) has been widely studied in last ten years. Recently, more works studies how to use DP in deep learning Abadi et al. (2016); Papernot et al. (2017) or use DP in federated learning Geyer et al. (2017); McMahan et al. (2017). In order to achieve a stronger privacy protection than traditional differential privacy, a few recent works start to use LDP Duchi et al. (2013, 2014); Erlingsson et al. (2014) into the federated learning Bhowmick et al. (2018); Nguyên et al. (2016); Seif et al. (2020); Truex et al. (2020).
However, existing approaches utilize LDP into federated learning, which can not be used for deep learning models. Some of them Nguyên et al. (2016); Seif et al. (2020) do not support deep learning approaches yet, which only focus on small tasks and simple datasets. The other works Bhowmick et al. (2018); Truex et al. (2020) studied the LDP on federated learning problem. However, as we discussed in the experiments part, both of them hardly achieve a good performance with a small limited budget. Due to the high variance of the noises in their approaches, it requires more communication rounds between cloud and clients, which spends more privacy cost in their framework. Our approach focus on solving the weakness of all previous approaches and accelerates the practical solution on complex dataset.
7 Conclusion and Future Plan
In this paper, we propose a new mechanism for LDP, show how to apply it to protect federated learning, and apply splitting and shuffling to each clients’ gradients to mitigate the privacy degradation caused by high data dimension and many query iterations. Empirical studies demonstrate our system performs better than previous related works on the same image classification tasks. Hopefully, our work can considerably accelerate the practical applications of LDP in federated learning.
In the future, several research topics can be further explored, such as preventing client anonymization from side-channel attacks, designing a better data perturbation mechanism, and applying the proposed mechanism to natural language processing, speech recognition or graph learning. Moreover, it is of profound significance to generalize the proposed privacy-preserving techniques to other scenarios.
-  (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, pp. 308–318. Cited by: §3.2, §6.
-  (2015) Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nature biotechnology 33 (8), pp. 831. Cited by: §1.
-  (2018) Protection against reconstruction and its applications in private federated learning. arXiv preprint arXiv:1812.00984. Cited by: §1, §2.2, §3.1, §4.4, §4.4, §5.1, §6, §6.
-  (2013) Local privacy and statistical minimax rates. In Proc. of IEEE Foundations of Computer Science (FOCS), Cited by: §6.
-  (2014) Privacy aware learning. Journal of the ACM (JACM) 61 (6), pp. 1–57. Cited by: §4.4, §6.
-  (2014) The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9 (3–4), pp. 211–407. Cited by: §4.4, §6.
-  (2014) The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9 (3–4), pp. 211–407. Cited by: §2.1.
-  (2011) Differential privacy. Encyclopedia of Cryptography and Security, pp. 338–340. Cited by: §2.1, Definition 1.
-  (2019) Amplification by shuffling: from local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2468–2479. Cited by: §3.2, §4.2.
-  (2014) Rappor: randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1054–1067. Cited by: §3.1, §4.2, §4.4, §6.
-  (2017) Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §1, §2.2, §5.1, §6.
-  (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §1, §2.2.
-  (2009) Learning multiple layers of features from tiny images. Cited by: §5.
-  (2009) Learning multiple layers of features from tiny images. Cited by: §1, §1.
-  (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §5.
-  (2010) MNIST handwritten digit database. Cited by: §1, §1.
-  (2018) PDLM: privacy-preserving deep learning model on cloud with multiple keys. IEEE Transactions on Services Computing. Cited by: §1.
-  (2016) Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629. Cited by: §1, §2.2.
-  (2017) Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963. Cited by: §6.
Neural and Statistical Classification13. Cited by: §1.
-  (2016) Collecting and analyzing data from smart device users with local differential privacy. arXiv preprint arXiv:1606.05053. Cited by: §1, §6, §6.
Smooth sensitivity and sampling in private data analysis.
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp. 75–84. Cited by: §1.
-  (2017) Semi-supervised knowledge transfer for deep learning from private training data. In 5th International Conference on Learning Representations, ICLR ’17. Cited by: §1, §3.2, §6.
Defense-gan: protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605. Cited by: §1.
-  (2020) Wireless federated learning with local differential privacy. arXiv preprint arXiv:2002.05151. Cited by: §3.1, §6, §6.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §5.
-  (2020) LDP-fed: federated learning with local differential privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking, pp. 61–66. Cited by: §5.1, §6, §6.
-  (2018) Private model compression via knowledge distillation. arXiv:1811.05072. External Links: Cited by: §4.4.
-  (2017) Locally differentially private protocols for frequency estimation. In 26th USENIX Security Symposium (USENIX Security 17), pp. 729–745. Cited by: §4.4.
-  (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §1, §1, §5.
Appendix A Proof
We know the weight ’s range is . If , then for any .
If , the above still holds. ∎
For any weight update from any client ,
The variance of each reported noisy weight is
The variance of the estimated average weight is
For each client , and , by Bernstein’s inequality,
In other words, there exists such that holds with at least probability. ∎
First, let’s assume suppose the dataset has only one value with domain and treat the identity function as the query function. The query function’s sensitivity is the difference between the max and min of . Then, we apply any mechanism satisfying -DP to , so for any , which also satisfy -LDP by definition. Now, if the dataset has values, we can apply the same mechanism to each value independently, and achieve -LDP because
Appendix B Comparison with Laplace Mechanism in Absolute Perturbation
By the definition of Laplace distribution, we can derive that with probability . In our LDP mechanism, . Therefore, we can draw Figure 5 to compare with Laplace Mechanism. It shows our mechanism makes the noisy data more distinct from its original value when is small. For instance, when , our data perturbation is greater than Laplace’s with probability 0.79/0.44 in the best/worst case. When , our data perturbation is greater w.p. 0.86/0.61 in the best/worst case.
Appendix C Experiment
c.1 Parameter Analysis
In order to evaluate the performance of the LDP-FL, we need to fine-turn different parameters in the system. Here we listed the most critical parameters, including the privacy budget , communication round , the number of clients , the fraction of clients . The fraction of clients denotes the random select the percentage of the clients to update the global model in each communication.
The parameters and are two important parameters in the proposed noisy training. Fig. 4 shows the change of the performance with varying and . In this work, we evaluate the proposed model on MNIST, Fashion-MNIST and CIFAR. To evaluate the parameter , we first fix the number of the client as 500. It can be found that when is too small, it does not affect the performance on MNIST and Fashion-MNIST, but effects significant on CIFAR-10 compared between the proposed LDP-FL and the baseline, which does not add any noise perturbation. When is close to 1, LDP-FL can achieve almost the same performance as the baseline on MNST, Fashion-MNIST and CIFAR-10. Another important parameter is the communication rounds between cloud and clients. It is not hard to see that with more communications, we can train a better model on MNIST, Fashion-MNIST and CIFAR-10 through the proposed model. However, due to the complexity of data and tasks, CIFAR-10 needs more epochs for a better model, which is close to the baseline. All the details are shown in Fig. 4.