1 Introduction
Recent advancement of reinforcement learning (RL) shows great success within broad domains ranging from market strategy decisions Abe et al. (2004), load balancing Cogill et al. (2006) to autonomous driving ShalevShwartz et al. (2016)
. Reinforcement learning is a process to obtain a good policy in a given environment in an unsupervised learning manner. Distributed reinforcement learning (DRL) is known as a practical solution to accelerate reinforcement learning in parallel
Mnih et al. (2016); Nair et al. (2015); Palmer et al. (2019); Bacchiani et al. (2019). It also gives us robust policies across different environments. A policy is regarded as robust if it performs well across various environments, but not overfitting a simulated environment. A policy overfitting a simulated environment does not work well for the realworld environment Rajeswaran et al. (2016).In case that the local environments are related to private information, such as private rooms and individual properties, there are privacy issues. Since a locally learned policy is strongly being affected by the individual environment, the output of the agent may release the private information unconsciously. Pan et al. (2019)
pointed out that reinforcement learning can cause privacy issue. They proposed an attack to recover the dynamics of agents through estimating the transition dynamics with their state space, action space, reward function, and trained policy. For example, from the policy of a robot cleaner trained in an individual’s room, an adversary can estimate the room layout if she can access the policy. In the distributed settings, sending information by the local agent has serious privacy risks if we do not believe the central aggregator.
Local differential privacy (LDP) Kasiviswanathan et al. (2011); Duchi et al. (2013) gives a rigorous privacy guarantee when data providers send information to a data curator. Mechanisms ensuring LDP makes outputs indistinguishable values regardless of the input. In this paper, we aim to design locally differentially private algorithms for DRL, such that reported information from local agents is indistinguishable.
To achieve the DRL under LDP constraints, we develop a framework that leans a robust policy based on the reported information from the agents while preserving local privacy of them (Figure 1). We call the framework Private Gradient Collection (PGC). In the framework, first, the central aggregator distributes a global model to several local agents. Second, the local agents update the model at local private environments. Third, the agents report noisy gradients that satisfy LDP to the central aggregator. At last, the central aggregator updates the global parameters by utilizing a set of reported noisy gradients. After updating the global model, the central aggregator distributes the model to the other agents to learn more and more. Following the above way, local agents can report their updates by submitting noisy gradients even if the local nodes do not have any deliverable data. Besides, the central aggregator easily updates the global model by just applying the collected gradients to the model, without any privacy concerns of the local agents.
To the concrete realization of the framework, we introduce an algorithm based on asynchronous advantage actorcritic (A3C) based. For introducing randomness that satisfies LDP, we present two mechanisms for the gradient submission.
To the best of our knowledge, this is the first work that actualizes DRL under local differential privacy. In this paper, we show that our algorithm ensures LDP guarantees by utilizing a series of techniques. In our empirical evaluations, we demonstrate how our method learns the robust policy effectively even it is required to satisfy local differential privacy. This work enables us to obtain a robust agent that performs well across distributed private environments.
1.1 Related Works
For privacypreserving distributed reinforcement learning, both cryptographical and differentially private approaches have been studied.
The cryptographical approaches conceal information during their learning process Zhang and Makedon (2005); Sakuma et al. (2008). However, if several agents cooperate, they have chances to estimate the information of the other agents. Our proposed method under LDP is robust against the cooperated adversarial parties. Even if all other remaining agents attack an agent, the agent’s dynamics are indistinguishable from the different candidate dynamics.
Noisy DQN Fortunato et al. (2018) is a DQN Mnih et al. (2015) variant that aims to improve learning stability by injecting Gaussian noise, but it has no way to preserve privacy. Wang and Hegde (2019) introduced differentially private Qlearning in continuous spaces. Zhu and Philip (2019) introduced a cooperative multiagent system that chooses advice information from neighboring agents in a differentially private way. Chamikara et al. (2019) proposed a private distributed learning framework that craft perturbed data satisfying LDP at local data holders and send it to the untrusted curator. We focus on DRL that agents do not have such deliverable data, but submit perturbed gradients.
2 Preliminaries
Before detail discussions, we introduce essential background notations, definitions, and related works to understand our proposals.
2.1 (Local) Differential Privacy
Differential privacy Dwork et al. (2006) is a rigorous privacy definition, which quantitatively evaluates the degree of privacy protection when releasing statistical aggregates. While local differential privacy (LDP) Kasiviswanathan et al. (2011); Duchi et al. (2013) gives a rigorous privacy guarantee when data providers send information to a data curator. Suppose the data providers send information to a collector via some random mechanism .
Definition 1
For all possible input and for any subset of outputs , a randomized mechanism satisfies local differential privacy if it holds that
(1) 
where is the Napier number.
The definition requires to output indistinguishable values regardless of the input. An essential property of a mechanism is the (global) sensitivity of the output.
Definition 2
For all input , the sensitivity of a mechanism is defined as
(2) 
The sequential composition is the property that describes an intuition that more outputs more violate privacy.
Theorem 1
Let be a series of mechanisms. Assume that satisfies (L)DP for each , respectively. Then, the series of mechanisms satisfies (L)DP.
Postprocessing invariance is a property that differentially private information never harm privacy anymore.
Theorem 2
For any deterministic or randomized function defined over the mechanism , if satisfies (L)DP, also satisfies (L)DP for any input .
Because of the property, in a local private setting, the curator is allowed to run arbitrary processing for the collected data.
Laplace mechanism. Laplace mechanism is the well known randomized mechanism that samples randomized values from the Laplace distribution. The Laplace distribution is designed based on the sensitivity of the target function outputs. The mechanism samples the randomized output from the Laplace distributions denoted as
(3) 
where is the th element of .
Bit flip. Bit flip Ding et al. (2018, 2017) is a randomization technique for satisfying (L)DP. For input ,
(4) 
By the bit flip, the randomized outputs have sharp directions.
Random projection Johnson (1984); Achlioptas (2001); Bingham and Mannila (2001)
is a useful technique that reduces the dimensionality of the vector by a random matrix. We can use the random matrix
such that(5) 
where is the dimension of mapped space. The random matrix has the useful property that the column vectors of are almost orthogonal each other Achlioptas (2001); Bingham and Mannila (2001). Thanks to the property, we can approximately recover the original vector using the transposed matrix from the compressed vector.
2.2 Distributed Reinforcement Learning
Benefits of distributed reinforcement learning (DRL) are increasing learning efficiency and obtaining a robust policy. We focus on the later while preserving the privacy of local agents. Not only the distributed setting, but robust RL is also the generalization of RL, which adapts uncertainty of the transition dynamics Morimoto and Doya (2005); Rajeswaran et al. (2016); Pinto et al. (2017). Our goal is to obtain a transferable policy that performs well across various environments.
Suppose there are a central aggregator and distributed agents. Agent
moves around on the Markov decision process (MDP) which is characterized with common state space
, common action space , common reward function , common discounting factor and local transition dynamics . contains some terminal states. Each local dynamics decides the next state after an action on a state where is the probability simplex on . Local dynamics is parametrized with where is a parameter set. For each round , agent has state and takes action . After the action, the agent gets reward from , and the state transits . gives to the terminal states. The transition follows local transition dynamics . Agent decides its action by policy , which is shared by all agents. The central aggregator trains the policy with the cooperation of the agents. Defining as the initial state distribution, for each agent , historyis the random variable such that
(6) 
where is determined with the MDP and the policy. To obtain a robust policy, which works well on some dynamics in the possible dynamics set , we solve an optimization problem. The objective function is
(7) 
where is the discounting factor.
2.2.1 Asynchronous Advantage Actorcritic
Asynchronous advantage actorcritic (A3C) Mnih et al. (2016) is a DRL framework, which is originally proposed for acceleration of policy training in parallel. On distributed A3C protocol, each agent optimizes both policy and approximation of a statevalue function. The policy is denoted as . For some and some , represents the confidence for action on . Based on the confidence, each agent decides the next action at each time step. Statevalue function is the function which represents the value of each state on policy .
(8) 
Further, denotes the approximated state value function. For simplicity, this paper denotes and as and .
3 Locally Differentially Private ActorCritic
3.1 Our Algorithm
We here present our algorithm for DRL under local differential privacy. We first introduce an overview and an abstract model of our algorithm. We call the abstract model Private Gradient Collection (PGC). Based on the model, we present our algorithm PGCA3C that is a method based on A3C with satisfying LDP for all local agents. We also address the privacy analysis of the proposed method and give some extensions.
3.1.1 Private Gradient Collection
Suppose the central aggregator has a model parameterized with where is the dimensionality of , and trains by utilizing reported information from local agents. The central aggregator and all local agents share the parameters
, the structure of the model, and loss function
.The abstract model PGC follows the below four steps:

The central aggregator delivers to local agents.

Each local agent initializes her parameters by and updates in her local private environment.

The local agent reports information about her model with injecting noise to satisfy LDP.

The central aggregator updates by only utilizing the received noisy information from the local agents.
The primal question to design the model is what information should local agents report? Our answer is stochastic gradient. Hence, the local agent computes loss and its gradient behind local observations and rewards, then submits a noisy gradient to the central node.
In the local training process, each agent inputs to the network and obtains the next action or some information to decide the action. At the end of an episode, with history and observed rewards , agent evaluates by the loss function L. After the evaluation, the agent computes stochastic gradient of along . Before reporting to the central aggregator, she randomizes the gradient. The randomness (e.g., additive noise) is designed to satisfy LDP via random mechanism .
Definition 3
(LDP for gradient submissions) For each , any and any subset , the following inequality must hold:
(14) 
With the noisy gradients, the central aggregator updates . Then, the updated parameters are shared with all distributed agents again. To make the problem simple, we assume that one agent submits the gradient only once. Because of the postprocessing invariant, the central aggregator can apply the noisy gradients to the parameters in any way. She can use any gradient method and can use a submitted gradient multitime.
3.1.2 PgcA3c
As a concrete realization of the PGC framework, we propose PGCA3C, which is an LDP variant of A3C. Following the PGC framework, PGCA3C employs the gradient submissions with a randomized mechanism from local agents to the central aggregator. The other procedure follows the original algorithm of A3C. Algorithm is the overall procedure of PGCA3C.
Empirical Loss Minimization.
As well as vanilla A3C, based on the episode history , the local agent evaluates empirical loss:
(15) 
where
(16) 
(17) 
(18) 
The empirical loss replaces random value in Equation 9 with observed . After the evaluation, the agent computes stochastic gradient of the empirical loss along . For the stochastic gradient, each agent crafts the noisy gradient by a randomized mechanism to satisfy LDP, and submits to the central aggregator.
Crafting Noisy Gradient.
We discuss how to craft a noisy gradient that satisfies LDP. A simplest way to craft the gradient is to follow the Laplace mechanism that is wellknown in differential privacy literature. However, for a stochastic gradient, it is hard to deal with its sensitivity. We employ clipping technique to bound the sensitivity.
(19) 
where is a stochastic gradient vector, is clipping size. Each agent clips the gradient by the norm with a positive constant , and then the sensitivity is bounded by . That is, any two clipped gradients satisfies
(20) 
Based on the clipping (19) and the sensitivity bounded by (20), each agent generate the Laplace noise such that:
(21) 
where is the th dimensional value of . With the noise , each agent report noisy gradient to the central aggregator. This procedure is described in Algorithm 2.
Updating Global Parameter with Buffer.
The central aggregator updates his global parameter
by received gradients from the local agents. To reduce the variance of the noisy gradients, we introduce a temporal storage
buffer. The central aggregator first stores multiple noisy gradients into the buffer , and update with utilizing all as(22) 
The central aggregator does not utilize any other information about local agents except received noisy gradients. Therefore, the update process, which is the postprocessing of all received gradients, does not violate LDP for any local agents. After updating , the central aggregator flushes the buffer . We expect that the buffering improves learning stability as well as minibatch learning.
3.1.3 Acceleration of Learning Efficiency
Since the Laplace mechanism is simple, but decreases the accuracy of gradient significantly, the learning efficiency of the whole DRL might be decreased. We introduce an alternative randomizing technique, projected random sign (PRS), to have opportunity to increase the learning efficiency. To increase the learning effieicy, the PRS mechanism addresses reducing dimensionality and sharpning gradient direction while injecting randomness for LDP.
First, the PRS applies the random projection to reduce dimensionality. Each agent maps dimensional stochastic gradient vector to dimensional vector with random matrix as . follows (5).
Second, before applying randomization, each agent applies elementwise clipping denoted as follows:
(23) 
where is the th dimensional value of . This elementwise clipping bounds the sensitivity by .
3.2 Privacy Analysis
Lemma 1
An algorithm that follows the PGC framework satisfies LDP for all local agents.
Proof Sketch
In step 3 of the PGC framework, each agent reports a noisy gradient that ensures LDP. In step 4, the central aggregator updates only utilizing the received noisy gradients from the local agents. This step is independent of any information about local agents. Therefore, steps 4 does not violate LDP due to postprocessing invariance. Move forward to the next round. The central aggregator delivers the updated parameter to the other agent at step 1. At step 2, different agent copies the parameter as and updates through her local environment. Since the learning process at a local agent is independent of all other agents, the output also does not violate LDP for all other agents.
Lemma 2
Gradient submission with the Laplace mechanism satisfies LDP.
Proof
Each agent is given which contains the information of where . With , agent computes gradient and outputs . With the clipping and Laplace mechanism, for any and , the following inequality holds.
Since the inequality holds regardless , the following also holds. For any ,
Lemma 3
Gradient submission with the PRS mechanism satisfies LDP.
Proof
Lemma 4
Updating paramters on the central aggregator (22) does not violate LDP that has been satisfied for each local agent.
Proof
The parameter update (22) is only utilizing received noisy gradient in the batch , which means independent from any information about any local agents except noisy gradients. Due to postprocessing invariance, the parameter update does not violate LDP that has been satisfied at each local agent.
Theorem 3
PGCA3C (Algorithm 1) satisfies LDP for all local agents.
3.2.1 Extending to multiple submissions
We can easily extend the algorithm to a locally multiround algorithm. In the multiround algorithm, each agent submits a randomized stochastic gradients times. For each submission, the agent consumes a privacy budget , and the whole consumed budget is because of the sequential composition theorem (Theorem 1).
4 Experiments
We here demonstrate the effectiveness of our proposals. We evaluate learning efficiency, success ratio, and tradeoff between privacy and efficiency. Before showing empirical results, we describe what the evaluation task is and how to implement PGCA3C.
Evaluation Task. We make some numerical observations on cart pole with different gravity acceleration coefficients . Suppose that each coefficient is appeared uniformly. Cart Pole Barto et al. (1983) is the classical reinforcement learning task that an agent controls a cart with a pole to keep the pole standing. The number of time steps in which the pole is standing is the cumulative reward. The cumulative reward is called score in this section, and the maximum score is 200. consists of cart position , cart velocity , pole angle and pole velocity at tip . The cart moves on a onedimensional line. At each time step, the agent selects an action from .
Stopping Criteria. We hire a way to iterate the learning process until the central aggregator received the predefined number of submissions from agents. Since we assume that each agent submits only once, the number of submissions is identical to the number of agents. We assume the scores are not private information. If we need to protect the score with LDP, we can easily develop additional submissions that agents send a noisy boolean representing whether the score is larger than a threshold.
Implementation.
To implement the proposed algorithms, we use two shallow neural networks corresponding to
and, respectively. Each network has two layers, and the activation functions are ReLU.
, , and . Given a state , outputs the confidence in each action. Each agent takes an action having with probability . With probability , the agent takes a randomly selected action. is decreased from to as. Our implementation utilizes 9 threads for asynchronous agent processes. Empirical codes are developed by Python 3.7.4, TensorFlow 1.14.0
Abadi et al. (2015) and OpenAIGym 0.14.0 Brockman et al. (2016).Hyper Parameters. We set discounting factor , learning rate , loss scaling factors and . is set for the Laplace mechanism and is set for the PRS. For the PRS, we set as following Wang et al. (2019).
Score. Each local agent measures a score, which is how long time steps the pole keeps the standings. We observe how the learning progress reaches the target score (). Especially to evaluate robustness across various environments, we measure the average score. The average score at th submission over last (=10) submissions is:
(25) 
where is the score at .
4.1 Observation of Learning Behaviors
First, we observe the learning behaviors of our proposed methods to decide several hyperparameters’ values. We compare two mechanisms with and without buffering, whose buffer size is . We regard training as a success if the average score meets .
Figure 6 shows the average scores of our proposed method employing two different mechanisms during the training process with and . Without buffering, scores for all settings change drastically, but the buffering limits the learning dynamics of the Laplace mechanism too small to learn. However, in the PRS mechanism, the buffering gives better learning stability than without buffering. Thus, the buffering helps the PRS mechanism to train the policy well, but it disturbs the training with the Laplace mechanism.
In the later part of the evaluations, we employ the Laplace mechanism with , and the PRS with .
4.2 Learning Efficiency
We evaluate how early the algorithms achieve a target score. To measure it, we define a metric first success time (FST):
(26) 
We regard as if a learning process cannot meet the success in 90,000 updates.
Table 1 shows the median of the in trials for each setting. The smaller median value suggests that a method is efficient to learn. We measure the FST varying , and . The result of the Laplace mechanism shows a decreasing the median of along with increasing . While PGCA3C with PRS shows better results within , but it does not show such improvement at . Therefore, PGCA3C with the PRS has a chance to increase learning efficiency than Laplace.
median of (26)  

Lap  18377.0  20238.5  5714.5  4055.0  1769.0 
PRS  25226.5  7549.0  2656.5  11217.5 
success ratio  

Lap  0.80  0.90  1.00  1.00  1.0 
PRS  0.85  0.95  0.90  0.90 
relative area under curve  
Lap  0.673  0.711  0.909  0.965  1.0 
PRS  0.660  0.862  0.835  0.771 
4.3 Success Ratio
We evaluate how many times proposed algorithms achieve the target score over trials for each setting. That is
(27) 
where is the of th trial. Here we set .
Table 2 shows the success ratios for various settings. The nonprivate A3C () succeeds in all trials. With both of the randomized mechanisms, the algorithms tend to make more successes for larger . The algorithm using PRS gives more successes for , and the algorithm using the Laplace mechanism shows better for .
Figure 9 plots the success ratio at . The horizontal axis shows the number of submissions, and the vertical axis shows the ratio of the trials in which an average score exceeds by the update. With larger , both algorithms achieve a high success ratio consuming fewer updates. The algorithm using PRS makes more successes in the early stage, and the algorithm with Laplace shows more successes by the end of each training process. This is due to larger loss of gradient by the PRS against the Laplace.
Table 3 shows the relative area under curve (AUC) of Figure 9 against the AUC of nonprivate A3C. An algorithm having a larger AUC is regarded as a better algorithm. Laplace mechanism achieves larger AUC in proportional to . While PRS shows the best at .
The PRS mechanism gives us more efficient DRL under LDP at some . The PRS mechanism may be a better choice if we require strong privacy guarantees (). Otherwise, the Laplace mechanism seems more promising.
5 Conclusion
We studied locally differentially private algorithms for distributed reinforcement learning to obtain a robust policy that performs well across distributed private environments. We proposed a general framework PGC, and its concrete algorithm PGCA3C with two randomized mechanism for injecting randomness. Our proposed algorithm leans a robust policy based on the reported noisy gradients that satisfy LDP from local agents. Without any privacy concerns of the local agents, the algorithm can update a global model to make it robust across various environments. We also demonstrated how our method learns the robust policy effectively even it is required to satisfy local differential privacy. This work enables us to obtain a robust agent that performs well across distributed private environments.
References

TensorFlow: largescale machine learning on heterogeneous systems
. Note: Software available from tensorflow.org External Links: Link Cited by: §4.  Cross channel optimized marketing by reinforcement learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, New York, NY, USA, pp. 767–772. External Links: ISBN 1581138881, Link, Document Cited by: §1.
 Databasefriendly random projections. In Proceedings of the Twentieth ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, PODS ’01, New York, NY, USA, pp. 274–281. External Links: ISBN 1581133618, Link, Document Cited by: §2.1.
 Microscopic traffic simulation by cooperative multiagent deep reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Richland, SC, pp. 1547–1555. External Links: ISBN 9781450363099, Link Cited by: §1.
 Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics SMC13 (5), pp. 834–846. External Links: Document, ISSN Cited by: §4.
 Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, New York, NY, USA, pp. 245–250. External Links: ISBN 158113391X, Link, Document Cited by: §2.1.
 OpenAI gym. External Links: arXiv:1606.01540 Cited by: §4.

Local differential privacy for deep learning
. arXiv preprint arXiv:1908.02997. Cited by: §1.1.  An approximate dynamic programming approach to decentralized control of stochastic systems. In Control of Uncertain Systems: Modelling, Approximation, and Design, B. A. Francis, M. C. Smith, and J. C. Willems (Eds.), Berlin, Heidelberg, pp. 243–256. External Links: ISBN 9783540317555 Cited by: §1.
 Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 3571–3580. External Links: Link Cited by: §2.1.

Comparing population means under local differential privacy: with significance and power.
In
ThirtySecond AAAI Conference on Artificial Intelligence
, Cited by: §2.1.  Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, Vol. , pp. 429–438. External Links: Document, ISSN 02725428 Cited by: §1, §2.1.
 Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, S. Halevi and T. Rabin (Eds.), Berlin, Heidelberg, pp. 265–284. External Links: ISBN 9783540327325 Cited by: §2.1.
 Noisy networks for exploration. In International Conference on Learning Representations, External Links: Link Cited by: §1.1.
 Extensions of lipshitz mapping into hilbert space. Conference modern analysis and probability, 1984 (), pp. 189–206. External Links: ISSN , Document Cited by: §2.1.
 What can we learn privately?. SIAM Journal on Computing 40 (3), pp. 793–826. Cited by: §1, §2.1.
 Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning  Volume 48, ICML’16, pp. 1928–1937. External Links: Link Cited by: §1, §2.2.1.
 Humanlevel control through deep reinforcement learning. Nature 518 (7540), pp. 529. Cited by: §1.1.
 Robust reinforcement learning. Neural Computation 17 (2), pp. 335–359. External Links: Document, Link, https://doi.org/10.1162/0899766053011528 Cited by: §2.2.
 Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296. Cited by: §1.
 Negative update intervals in deep multiagent reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Richland, SC, pp. 43–51. External Links: ISBN 9781450363099, Link Cited by: §1.
 How you act tells a lot: privacyleaking attack on deep reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 368–376. Cited by: §1.
 Robust adversarial reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning  Volume 70, ICML’17, pp. 2817–2826. External Links: Link Cited by: §2.2.
 Epopt: learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283. Cited by: §1, §2.2.
 Privacypreserving reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, New York, NY, USA, pp. 864–871. External Links: ISBN 9781605582054, Link, Document Cited by: §1.1.
 Safe, multiagent, reinforcement learning for autonomous driving. CoRR abs/1610.03295. External Links: Link, 1610.03295 Cited by: §1.
 Privacypreserving qlearning with functional noise in continuous spaces. In Advances in Neural Information Processing Systems, pp. 11323–11333. Cited by: §1.1.
 Collecting and analyzing multidimensional data with local differential privacy. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), Vol. , pp. 638–649. External Links: Document, ISSN Cited by: §4.
 Privacy preserving learning in negotiation. In Proceedings of the 2005 ACM Symposium on Applied Computing, SAC ’05, New York, NY, USA, pp. 821–825. External Links: ISBN 1581139640, Link, Document Cited by: §1.1.
 Applying differential privacy mechanism in artificial intelligence. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 1601–1609. Cited by: §1.1.
Comments
There are no comments yet.