I Introduction
Deep Neural Networks (DNNs) have become one of the most popular and powerful machine learning methods for a wide range of artificial intelligent tasks. As modern DNN models are more complicated and require more training efforts, distributed learning has gained a lot of popularity, where multiple distributed participants work collaboratively on a training task
[22, 15, 3]. Distributed learning can be divided into two categories: centralized and decentralized learning [17]. A centralized learning system utilizes a centralized server to collect and aggregate estimates (i.e., model parameters) of participants at each iteration; while in a decentralized system, each participant exchanges estimates with its neighbors to reach consensus on the DNN model.
Distributed learning can prevent direct privacy leakage as each participant keeps its own private dataset locally during the training process. However, it still faces threats of indirect privacy leakage. Participants in distributed systems constantly exchange estimates, which may contain information about their private training data. Past works have demonstrated the feasibility and severity of model inversion attacks [10, 29, 9] and membership inference attacks [20] in the distributed learning setting.
To mitigate such privacy threats in distributed training, one promising solution is Differential Privacy (DP). DP was originally introduced to preserve the privacy of individual data records in statistical databases [5]
. A number of studies have then applied DP to enhance the privacy of deep learning (DL) in different environments
[1, 28, 16, 26, 12]. Generally, DL training usually adopts Stochastic Gradient Descent (SGD) to iteratively minimize the loss function and identify the optimal model parameters. Then these DPSGD algorithms inject additional noise at each iteration. By carefully restricting the sensitivity of the SGD and tracking the accumulated privacy loss, DPSGD can guarantee the DP of deep models.
For DP solutions, there exists a tradeoff between privacy and usability, determined by the noise scale added during training. Adding too much noise can meet the privacy requirements. However, it can also decrease the model accuracy. As a result, it is critical to identify the minimal amount of noise that can provide desired privacy protection, and also maintain acceptable model performance. In this paper, we aim to propose a novel DPSGD algorithm for decentralized learning systems, that can efficiently reduce the required noise scale for DP guarantee, compared to prior works.
Ia Related Works
DP techniques for deep learning. Existing DPSGD algorithms adopt additive noise mechanisms by adding random noise to the gradients during the training process. To improve the model usability and guarantee DP, these algorithms usually restrict the sensitivity of randomized mechanisms. Abadi et al. [1] bounded the influence of training samples on gradients by clipping each gradient in norm below a given threshold. Yu et al. [26] optimized the model accuracy by adding decay noise to the gradients over the training time since the learned models converge iteratively.
Another way to improve the model usability lies in precisely tracking the overall privacy cost of the training process. Shokri et al. [24] and Wei et al. [25] composed the additive noise mechanisms using the advanced composition theorem [6], leading to a linear increase in the privacy budget. In [1, 2, 11, 14]
, moments account (MA) was used to reduce the added noise by keeping track of a bound on the moments of the privacy loss during the training process. Other algorithms
[23, 13, 26] were designed to improve the model usability using (zero) concentrated DP [7], based on the observation that the privacy loss of an additive noise mechanism follows a subGaussian distribution.
Applying DP to distributed learning systems. For centralized learning systems, some works [24, 2, 11, 14] applied the above techniques to preserve the privacy of the training data for each participant. For decentralized learning systems, several DP algorithms [27, 28, 16, 4] were also proposed using the DP techniques. However, those decentralized solutions are restricted to the Alternating Direction Method of Multipliers algorithm, and cannot be used with the mainstream SGD. The only stateoftheart solution for DPSGD in decentralized systems is [16], which optimized the DP mechanism with an advanced composition theorem for tracking the accumulated privacy loss. This serves as the baseline for comparison with our solution.
IB Contribution
We design a new DPSGD solution for decentralized learning with following contributions.
Higher usability. All existing works focused on restricting the sensitivity of the SGD algorithm in order to improve the model usability, which seems to have reached the performance limit. In contrast, we propose a novel topologyaware technique, which can leverage network features of decentralized systems to optimize the randomized mechanism. This can effectively reduce the noise scale and improve the model usability. In addition, we also apply the noise decay technique from the singleparty training mode to the decentralized system, to further optimize the DP protection.
More applicability. Due to the discrepancy of network bandwidth or unpredictable system faults, asynchronous decentralized systems are pervasive [18, 19]. Unfortunately, existing DP algorithms usually assume global synchronization, which makes them vulnerable to the fluctuations in decentralized systems. For instance, the training task has to be suspended or slowed down under poor connection among participants.
Our solution considers both synchronous and asynchronous decentralized learning. We introduce a novel learning protocol, where the agent calculates and sends different aggregated estimates to neighbors. This protocol can maximize the noise reduction from the topologyaware technique. Meanwhile, it can perfectly adapt to the asynchronous learning mode.
Formal privacy analysis and comprehensive experimental evaluation. We formally prove that our solution can guarantee DP for all participants. We demonstrate the benefits brought by our optimization techniques. We conduct extensive experiments to show the superior of our method over prior works under various system settings.
Ii Preliminaries
Iia Decentralized Systems
We consider a decentralized system whose communication topology can be represented as an undirected graph: . denotes a set of participates (or agents) in this decentralized network. represents the set of communication links among the agents, with the following two properties: (1) if and only if agent can receive information from agent ; (2) if .
In this decentralized learning system, the agents cooperatively train a model by optimizing the loss function with SGD and exchanging estimates with their neighbors. Let be the
dimensional estimate vector of a DL model, and
be the loss function. Each agent obtains a private training dataset , consisting of independent and identically distributed (i.i.d.) data samples from a distribution . Those agents train a shared model by solving the optimization problem: , where is a training data sample from .During the training process, agent updates its local estimate iteratively, and sends to its neighbors . In the synchronous mode, agent needs to receive all the estimates from its neighbors before updating its local estimate. In the asynchronous mode, some neighbors are not able to communicate with agent at certain iterations due to the low bandwidth or system crash. Agent can only collect the estimates from part of its neighbors. To adapt to both synchronous and asynchronous modes as well as maintaining the convergence rate, agent (1) first asks each neighbor whether to participate at each iteration; (2) randomly selects a neighbor from the response neighbors ; (3) utilizes the following update rule [18, 8] to calculate the local estimate:
(1) 
where is a hyperparameter determining the weight of the local estimate; is the learning rate; is the stochastic gradient with . The gradient can also be replaced by a minibatch of stochastic gradients [17, 18].
IiB Differential Privacy
DP is a rigorous mathematical framework to protect the privacy of individual records in a database when the aggregated information about this database is shared among untrusted parties [5]. In a decentralized system, we assume all agents are honestbutcurious. Our goal is to adopt DP to protect the training data privacy of each agent. Decentralized learning with DP is formally defined below:
Definition 1.
(Decentralized Learning with DP) For each agent , a randomized mechanism with domain and range satisfies DP if for any two adjacent datasets and any subset of outputs , the following property is held:
(2) 
is restricted by two parameters: and . is the privacy budget of agent to limit the privacy loss of training data. is a relaxation parameter that allows the privacy budget of to exceed
with probability
. A decentralized learning system is differentially private if for , satisfies DP. Each agent can set its own privacy budget. Alternatively, the entire system can enforce a uniform privacy budget for all agents.To achieve differentially private decentralized learning, a common and straightforward way is to use additive noise mechanisms at each iteration [12]. Specifically, we use Gaussian mechanism and denote as the noise parameter of agent . At each iteration, agent adds the Gaussian noise, , to the updated local estimate to guarantee differential privacy (Eq. 3). Then, sends to its neighbors.
(3) 
Iii An Optimized DPSGD Algorithm
As shown in Eq. 3, the random noise added into the aggregated estimate must be large enough to satisfy the privacy requirement. However, adding too much noise can affect the model accuracy. So it is important to balance this tradeoff. We propose a novel DPSGD algorithm, which can reduce the amount of noise for each agent to improve the usability of trained models, without violating the DP requirement. Our algorithm is generalpurpose, and can be applied to both synchronous and asynchronous modes. It consists of two strategies and one learning protocol, as described below.
Iiia Strategy 1: Topologyaware Noise Reduction
Existing DPSGD algorithms all assume that the required noise scale only depends on the agents themselves. In decentralized systems, the communication topology can affect the amount of noise as well. Our proposed strategy is able to reduce the noise scale of each agent when considering its connectivity with its neighbors. The key insight of our approach is that the received estimates from other neighbors also contain certain noise, which can contribute to the noise scale of the aggregated estimate, thus reducing the amount of noise added by the agent itself.
Fig. 1 gives an illustrative example. We consider an agent with four neighbors, where and are connected as well. When obtains all estimates of its neighbors, we assume it picks the estimate of for aggregation with its own estimate and gradient. Since the received also includes Gaussian noise , then the aggregated estimate following Eq. 3 will have the corresponding random component . As a result, when generating the estimate for agent or , does not need to add the fullscale noise . It only needs to inject the noise such that , which can meet the DP requirement, but reduce the actual amount of noise.
It is worth noting that the noise scale is not applicable when generating estimates for or . For , since it already knows its own parameter , then is not random noise anymore. It is similar for as it receives from . Then for these two agents, we can pick another agent (e.g., or ) and generate a different estimate for them with still reduced noise scale.
Formally, given an agent , for each of its neighbors , we define , which is the set of ’s neighbors that are not connected to (or itself). For instance, in Fig. 1, we have , and . This also means that can be used in the aggregation for all agents in with the reduced noise scale. Then our goal is to find a minimal set , such that using the agents inside this set for aggregation can cover all the neighbor agents of . Note that there can exist a neighbor that is connected to every neighbor in . Then we cannot find an nonadjacent neighbor to cover it, and should exclude it from . This process is described in Eq. 4. We will solve it approximately in Section IIIC.
(4) 
After identifying , for , if is connected to every neighbor in (i.e., ), then agent just sends the local estimate with fullscale noise to . Otherwise, there exists at least one neighbor such that . Then the noise scale from to , , should satisfy Eq. 5(a) in order to guarantee the DP requirement against , where are the fullscale noise. According to the additivity of Gaussian distribution, we calculate the noise parameter via Eq. 5(b). With this reduced noise scale, agent can update the estimate for agent based on Eq. 5(c).
(5a)  
(5b)  
(5c) 
IiiB Strategy 2: Timeaware Noise Decay
This technique was originally proposed in [26], to optimize the DP protection of model training in oneparty systems. Here we apply this technique to decentralized systems. The key idea is that the model converges and the norm of gradients decreases as the training iteration increases. Thus, the sensitivity of the Gaussian mechanism decreases, allowing us to inject less noise to the gradients. Note that the training datasets are distributed in different agents, all agents in the decentralized system should reach a consensus on the noise decay schedule to tolerate the differences in the datasets.
Specifically, compared to the aggregation process in Eq. 5(c), our first modification is to clip the gradients in norm to bound their size at each training iteration. We follow the method from [1]: given a clipping threshold , the clipped gradient vector is bounded by , as shown in Eq. 6(a).
Our second modification is to dynamically reduce the noise scale over the training time. Without loss of generality, we use step decay to reduce the noise scale every few epochs. Let
be the initial noise parameter of agent . The noise parameter of agent at th iteration is shown in Eq. 6(b), where is the reduction factor and is the reduction step of noise decay.(6a)  
(6b) 
IiiC Learning Protocol
With the two novel strategies, we now describe the endtoend protocol, as shown in Algorithm LABEL:alg:synchronous.
Synchronous training. The algorithm takes as input the initial estimate , initial noise parameter , learning rate , and number of iterations . Before training, agent sends to and receives from its neighbors (Line LABEL:line:broadcast). For , agent computes the neighbor set . Then, agent updates the initial estimate and sends to its neighbors (Lines LABEL:line:initialLABEL:line:initial_end). At th iteration, agent first computes the fullscale noise parameter using the timeaware noise decay strategy (Line LABEL:line:decay). Afterwards, agent generates estimates for its neighbors and updates its local estimate using the topologyaware noise reduction strategy.
To approximately solve Eq. 4, agent continuously selects an agent from exclusively until all agents are traversed or is found. For , agent computes the estimate and sends it to (Lines LABEL:line:networkLABEL:line:network_end). The complexity of the approximate method is . Then, agent randomly selects a neighbor and updates its local noised estimate (Lines LABEL:line:sampleLABEL:line:update). If there are still uncovered neighbors, sends its local estimate to the neighbors (Lines LABEL:line:ifLABEL:line:if_end). After iterations, Algorithm LABEL:alg:synchronous returns the final differentially private deep model.
Asynchronous training. Compared to the synchronous system, the neighbors may not participate in certain iterations in the asynchronous setting. To tolerate such situation, right before each iteration, agent first asks its neighbors whether they will participate in this iteration and gets the response neighbors (Line LABEL:line:response). Then it will continue the same procedure as synchronous training, with only the neighbors in this iteration.
algocf[t!]
Iv Privacy Analysis
We perform a formal analysis about Algorithm LABEL:alg:synchronous from the aspects of privacy and efficiency.
Iva Proof of DP
First, we prove Algorithm LABEL:alg:synchronous is differentially private by carefully choosing the initial noise parameters. We track the accumulated privacy loss of the training process using a modern DP technique, Rényi DP [21], which ensures a sublinear loss of privacy as a function of the number of iterations.
Theorem 1.
Let the number of iterations be . For any decentralized system and every agent , the randomized mechanisms in Algorithm LABEL:alg:synchronous is ()DP if we choose
(7) 
Proof.
We prove the theroem in the synchronous mode and ignore the timeaware noise decay strategy since it does not incur any additional privacy loss [26]. We clip the gradients in norm of and assume the privacy budget is the same at each iteration. According to the Gaussian mechanism [5], the update rule in Line LABEL:line:update is ()DP at one iteration if we choose
Using Rényi composition theorem [21], our new update rule is ()DP after iterations if we choose . Then, we have . Combining the above equations, we conclude that our update rule in Line LABEL:line:update is ()DP if we choose such that:
(8) 
We have proven that the local estimate of agent is differentially private during the training process. Then, we prove that for , the estimates generated for is also differentially private. Let be the selected agent for generating estimate for . Since , are not directly connected, the noise of can be used as a random component to guarantee the DP of against . Thus, Because all agents generate noise independently,the noise scale for should satisfy
(9) 
According to the additivity of Gaussian distribution, the noise parameter for the estimate for is . Therefore, in Algorithm LABEL:alg:synchronous, the estimates generated for ’s neighbors are also differentially private. ∎
IvB Efficiency Analysis
We further analyze the efficiency introduced by noise reduction when considering the communication topology. Without loss of generality, we assume for and . Let be the noise parameter of at iteration . According to the proposed topologyaware noise reduction strategy,
(10) 
Thus, compared with the fullscale noise parameter, the noise added to is reduced by a factor of . We observe that decreases as decreases. When approaches 0, the noise of the estimates that agent sends to/receives from its neighbors would be significantly reduced.
V Experiments
Va Implementation and Experimental Setup
Dataset and DNN model. We conduct the experiments on the MNIST dataset. It consists of a training set of 60k samples and a test set of 10k samples. We consider a fully connected network with a hidden layer of size 100 for image classification. We set a fading learning rate with the initial value of 0.05. Our algorithm is general and can be applied to other DNN tasks as well. Results on Cifar10 can be found in the supplementary material.
For the implementation of the decentralized system, we consider a network consisting of 30 agents, and each agent connects to others with the probability of 0.2 (connection rate). This decentralized system is guaranteed to be fully connected, i.e., there exists at least one path connecting two arbitrary agents. The training set of each agent is independent and identically distributed with the same size. In the synchronous mode, all 30 agents participate at each training iteration. In the asynchronous mode, we assume 10% of random agents will not be involved at each iteration.
Without loss of generality, the agents have same privacy budget (1.0) and relaxation hyperparameter (). We assume the agents reach the consensus on the timeaware noise decay strategy, where and are 0.9 and 1000, respectively. We clip the gradients in norm of 4.0.
Baselines and metrics. We consider different decentralized learning algorithms in our experiments:
It is worth noting the first three solutions cannot be applied to the asynchronous mode directly. For fair comparisons, we modify their update rules as Eq. 3 to follow our learning protocol for asynchronous learning. For each algorithm, we measure the testing accuracy of each agent’s model at every iteration during the training, and report the average accuracy.
VB Effectiveness of Proposed
We evaluate and compare the performance of those DPSGD algorithms under different settings in both synchronous and asynchronous modes.
Epoch v.s. accuracy. Figs. 2 and 3 illustrate the trend of average testing accuracy in the training process with different values. First, we observe that our proposed algorithm outperforms Li18 and Li18+MA, and is closer to the No Noise case, for different values and modes. Such advantage is more obvious with a smaller , as the reduced noise is larger. Second, Li18+MA has higher performance than Li18 because of the usage of MA. Different from our solution, the usability of the models from Li18 and Li18+MA decreases as decreases. This is caused by the increase of the noise of the selected estimates. Third, the model training in synchronous mode converges slightly faster than the one in the asynchronous mode, since each participant can contribute to the model training to accelerate the process.
Privacy budget v.s. accuracy. We consider the impact of privacy budget on the model accuracy, as shown in Fig. 4. We can observe our solution can beat the other two DP solutions for different privacy budgets. Besides, when the privacy budget decreases, the model usability decreases, as more noise is required to inject to the estimates. Meanwhile, the advantage of our solution also increases, as the amount of reduced noise increases as well. This indicates that our algorithm is more effective when a small privacy budget is needed.
VC Effectiveness of Topologyaware Noise Reduction
Our DPSGD algorithm is composed of two strategies: topologyaware noise reduction (NR) and noiseaware noise decay (ND). We evaluate the integration of these two strategies in the main paper. In this section, we measure the effectiveness of NR only. Figures 5 and 6 illustrate the performance comparison between NR and other DPSGD algorithms.
We observe that NR outperforms Li18+MA in both synchronous and asynchronous modes, and the advantage is more significant when is smaller. In the synchronous mode, NR almost has the same performance as NR+ND at the first 20 epochs, as the reduced noise from ND strategy is quite small at the first two reduction steps (the noise is not reduced at the first reduction step). With more epochs, NR+ND is slightly better than NR only, caused by the effectiveness of ND. In the asynchronous mode, NR almost have the same performance as NR+ND especially when equals 0.25.
VD Impact of Connection Rate
We set the connection rate of the decentralized network as 0.2 in the main paper. Our proposed algorithm is effective with different connection rate as well. In this section, we measure and compare the performance of different DPSGD algorithms with the connection rate of 0.1 and 0.4. Without loss of generality, we consider the synchronous mode and set as 0.25. Figure 7 shows the average accuracy of the agents with these two connection rates as the training epoch increases. We observe that the performance of each algorithm does not change with different connection rates. The underlying reason may be that although the number of an agent’s neighbors is changed with the connection rate, the agent still selects one estimate for updates at each iteration. Then the training result will not be changed as well. As such, our proposed solution can exhibit advantages over prior works under various network connection rates.
VE Impact of Aggregation Weight
We evaluate the performance of these algorithms with a larger (0.75) and smaller (0.125) values in the synchronous mode. The average accuracy of the agents is shown in Figure 8. The experimental results give the same conclusion as in Section V.B: a smaller value leads to better improvement from our proposed solution.
VF Results on CIFAR10
We also evaluate the DPSGD algorithms on a more complicated training task over CIFAR10 dataset. The model to be trained is a Convolutional Neural Network, consisting of two max–pooling layers and three fully connected layers. The system settings and configurations are the same as the ones on MNIST. We set
and the connection rate as 0.25 and 0.2.Figure 9 illustrates the experimental results in the synchronous and asynchronous modes. We observe that our solution (Proposed) outperforms prior DPSGD algorithms and approaches the baseline (No Noise) as the training epoch increases in both of the two modes. Li18 and Li18+MA even do not converge in the presence of Gaussian noise. The reason is that each parameter in the model needs to be appended with random noise to satisfy DP requirement. When the model becomes more complicated with more parameters, the overall divergence between the original model and the DPprotected model becomes larger, making it hard to converge. This scenario will never happen in our solution.
Vi Conclusion
In this paper, we proposed a novel DPSGD algorithm for decentralized learning systems. We introduced a topologyaware noise reduction technique, which can leverage the network topology to reduce the noise scale and improve model usability while still satisfying the DP requirement. We applied the timescale noise decay technique to the decentralized systems to further optimize the model performance. We designed a learning protocol, which enables the topologyaware technique and also adapts to the synchronous and asynchronous learning modes. To the best of our knowledge, this is the first study to utilize network topology to optimize DP algorithm, and deploy DP protection to asynchronous decentralized systems. Formal analysis proves our method can guarantee the privacy requirement, and empirical evaluations indicate our solution achieves better tradeoffs between privacy and usability under different system configurations.
References
 [1] (2016) Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §IA, §IA, §I, §IIIB, 3rd item.
 [2] (2018) Protection against reconstruction and its applications in private federated learning. arXiv preprint arXiv:1812.00984. Cited by: §IA, §IA.
 [3] (2020) A hitchhiker’s guide on distributed training of deep neural networks. Journal of Parallel and Distributed Computing 137, pp. 65–76. Cited by: §I.
 [4] (2019) Optimal differentially private admm for distributed machine learning. arXiv preprint arXiv:1901.02094. Cited by: §IA.
 [5] (2006) Our data, ourselves: privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 486–503. Cited by: §I, §IIB, §IVA.
 [6] (2010) Boosting and differential privacy. In IEEE Annual Symposium on Foundations of Computer Science, pp. 51–60. Cited by: §IA.
 [7] (2016) Concentrated differential privacy. arXiv preprint arXiv:1603.01887. Cited by: §IA.
 [8] (2020) Towards Byzantineresilient learning in decentralized systems. arXiv preprint arXiv:2002.08569. Cited by: §IIA.
 [9] (2019) Model inversion attacks against collaborative inference. In Annual Computer Security Applications Conference, pp. 148–162. Cited by: §I.
 [10] (2017) Deep models under the GAN: information leakage from collaborative deep learning. In ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618. Cited by: §I.
 [11] (2018) Efficient deep learning on multisource private data. arXiv preprint arXiv:1807.06689. Cited by: §IA, §IA.
 [12] (2019) Evaluating differentially private machine learning in practice. In USENIX Security Symposium, pp. 1895–1912. Cited by: §I, §IIB.
 [13] (2018) Distributed learning without distress: privacypreserving empirical risk minimization. In Advances in Neural Information Processing Systems, pp. 6343–6354. Cited by: §IA.
 [14] (2019) Weighted distributed differential privacy ERM: convex and nonconvex. arXiv preprint arXiv:1910.10308. Cited by: §IA, §IA.
 [15] (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §I.
 [16] (2018) Differentially private distributed online learning. IEEE Transactions on Knowledge and Data Engineering 30 (8), pp. 1440–1453. Cited by: §IA, §I, 2nd item.
 [17] (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In Advances in Neural Information Processing Systems, pp. 5330–5340. Cited by: §I, §IIA.
 [18] (2018) Asynchronous decentralized parallel stochastic gradient descent. In International Conference on Machine Learning, pp. 3043–3052. Cited by: §IB, §IIA.
 [19] (2019) Heterogeneityaware asynchronous decentralized training. arXiv preprint arXiv:1909.08029. Cited by: §IB.
 [20] (2019) Exploiting unintended feature leakage in collaborative learning. In IEEE Symposium on Security and Privacy, pp. 691–706. Cited by: §I.
 [21] (2017) Rényi differential privacy. In IEEE Computer Security Foundations Symposium, pp. 263–275. Cited by: §IVA, §IVA.
 [22] (2009) Distributed subgradient methods for multiagent optimization. IEEE Transactions on Automatic Control 54 (1), pp. 48–61. Cited by: §I.

[23]
(2017)
DPEM: differentially private expectation maximization
. In Artificial Intelligence and Statistics, pp. 896–904. Cited by: §IA.  [24] (2015) Privacypreserving deep learning. In ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321. Cited by: §IA, §IA.
 [25] (2020) Federated learning with differential privacy: algorithms and performance analysis. IEEE Transactions on Information Forensics and Security. Cited by: §IA.
 [26] (2019) Differentially private model publishing for deep learning. In IEEE Symposium on Security and Privacy, pp. 332–349. Cited by: §IA, §IA, §I, §IIIB, §IVA.
 [27] (2016) Dynamic differential privacy for ADMMbased distributed classification learning. IEEE Transactions on Information Forensics and Security 12 (1), pp. 172–187. Cited by: §IA.
 [28] (2018) Improving the privacy and accuracy of ADMMbased distributed algorithms. In International Conference on Machine Learning, Cited by: §IA, §I.
 [29] (2019) Deep leakage from gradients. In Advances in Neural Information Processing Systems, pp. 14747–14756. Cited by: §I.
Comments
There are no comments yet.