Towards Fair and Decentralized Privacy-Preserving Deep Learning with Blockchain

06/04/2019 ∙ by Lingjuan Lyu, et al. ∙ ibm Monash University The University of Melbourne Australian National University 0

In collaborative deep learning, current learning frameworks follow either a centralized architecture or a distributed architecture. Whilst centralized architecture deploys a central server to train a global model over the massive amount of joint data from all parties, distributed architecture aggregates parameter updates from participating parties' local model training, via a parameter server. These two server-based architectures present security and robustness vulnerabilities such as single-point-of-failure, single-point-of-breach, privacy leakage, and lack of fairness. To address these problems, we design, implement, and evaluate a purely decentralized privacy-preserving deep learning framework, called DPPDL. DPPDL makes the first investigation on the research problem of fairness in collaborative deep learning, and simultaneously provides fairness and privacy by proposing two novel algorithms: initial benchmarking and privacy-preserving collaborative deep learning. During initial benchmarking, each party trains a local Differentially Private Generative Adversarial Network (DPGAN) and publishes the generated privacy-preserving artificial samples for other parties to label, based on the quality of which to initialize local credibility list for other parties. The local credibility list reflects how much one party contributes to another party, and it is used and updated during collaborative learning to ensure fairness. To protect gradients transaction during privacy-preserving collaborative deep learning, we further put forward a three-layer onion-style encryption scheme. We experimentally demonstrate, on benchmark image datasets, that accuracy, privacy and fairness in collaborative deep learning can be effectively addressed at the same time by our proposed DPPDL framework. Moreover, DPPDL provides a viable solution to detect and isolate the cheating party in the system.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep learning has become an important technology to deal with the challenging real-world problems such as image classification [goodfellow2016deep] and speech recognition [hinton2012deep]. Empirical evidence has demonstrated that deep learning models can benefit significantly from large-scale datasets [krizhevsky2012imagenet]. However, large-scale datasets are not always available for a new domain, due to the significant time and effort it takes for data collection and annotation [wang2018iterative, ma2018dimensionality]. Moreover, training complex deep networks on large-scale datasets could be computationally expensive and practically unachievable by a single party. Therefore, there is a high demand to perform deep learning in a collaborative manner among a group of parties. This trend is motivated by the fact that the data owned by a single party may be very homogeneous, resulting in an overfitted model that might deliver inaccurate results when applied to the unseen data, i.e., poor generalizability. In addition, decomposing and parallelizing computation among different parties could help reduce the demand for resources on any single party. However, collaboration can be greatly hindered by privacy and confidentiality restrictions of local parties. To overcome this problem, it is essential to develop a privacy-preserving collaborative learning framework that respects both data privacy and utility [mcgraw2013building]. Meanwhile, a central server is often required in most of the state-of-the-art learning frameworks, as presented in Fig. 1(a) and Fig. 1(b). It has been pointed out that these central server-based learning frameworks suffer from the following fundamental problems:

Party policies. Due to privacy concern, parties may not want to cede control to an untrusted server [mcconaghy2016bigchaindb].

Single-point-of-attack. The central server may easily become an obvious target for attacks. If the central server gets compromised, the entire network is under the risk of being compromised [fromknecht2014decentralized, mcconaghy2016bigchaindb].

Party join and departure. Participants cannot join or leave the network freely at any time. Every time any party joins or leaves the network for a short period of time, the process is disrupted and the server needs to deal with the network recovery. A new party may not allowed to participate an ongoing collaborative learning process without the authentication and reconfiguration on the central server [fromknecht2014decentralized].

Lack of fairness. Existing frameworks are all built upon the assumption that all parties equally contribute. In reality, one party might contribute more high-quality data (e.g., data with more diversity and high-quality annotations), while another may contribute nothing. However, at the end of training, all parties are allowed to get the same global model.

In this paper, we demonstrate that a viable solution to address these problems is to replace the centralized server with a decentralized collaboration framework and parallelize the computation among all parties. In particular, we propose DPPDL, a Blockchain-enabled decentralized privacy-preserving deep learning framework, which incorporates Blockchain technology into privacy-preserving deep learning. The proposed DPPDL records all operations, including uploading and downloading artificial samples or gradients, as transactions. To achieve fairness, we also utilize smart contract together with a novel credibility-based incentive mechanism. Empowered by Blockchain, DPPDL is able to benefit from all the participating parties, who merely need to exchange with other parties differentially private artificial samples and encrypted model gradients, without revealing more sensitive observation-level data. Our privacy model is stronger in the sense that it is a purely decentralized framework, where each participant does not trust any third party or other participants.

In summary, the following main contributions are made:

  • We develop a decentralized privacy-preserving deep learning framework, DPPDL, which integrates Blockchain with privacy-preserving deep learning to solve existing problems in server-based frameworks.

  • We are the first to formulate the notion of collaborative fairness, which is guaranteed in DPPDL through mutual evaluation of local credibility that considers relative contribution of each party during both the initial benchmarking and the privacy-preserving collaborative deep learning process.

  • To initialize and update local credibility, artificial samples generated by Differentially Private GAN (DPGAN) within a moderate privacy budget are randomly selected and shared among all parties.

  • For collaborative learning, in contrast with the existing schemes that leverage differential privacy at the cost of utility, we put forward a three-layer onion-style encryption scheme to guarantee accuracy, privacy, party obliviousness and system robustness.

  • The experimental results on two benchmark datasets under three realistic settings demonstrate that our proposed framework achieves high fairness, delivers comparable accuracy to both centralized and distributed deep learning frameworks, and outperforms the standalone one, thus confirming the applicability of DPPDL.

Ii Related Work and Preliminaries

This section reviews related work in privacy-preserving collaborative deep learning, and techniques used in this paper, including Blockchain technology, differential privacy, and homomorphic encryption.

Ii-a Privacy-Preserving Collaborative Deep Learning

In general, privacy-preserving collaborative deep learning can be categorized into three types: Centralized deep learning, Distributed deep learning and Decentralized deep learning.

Centralized deep learning: A centralized deep learning model forces multiple participants to pool their data into a centralized server to train a global model, as shown in Fig. 1(a). This centralized model is very effective, but it is privacy-violating since the server is entitled to see all participants’ data in the clear. As pointed out by Shokri [shokri2015privacy], centralized deep learning poses serious privacy concerns, for example, all the sensitive training data are revealed to a susceptible third party; data owners have no control over the learning objective; the learned model is not directly available to data owners. To mitigate these privacy risks, Gilad-Bachrach et al. [gilad2016cryptonets]

developed CryptoNets to run deep learning on homomorphically encrypted data. However, CryptoNets assumes that neural network model has been trained beforehand, hence their system is mainly used to provide encrypted outputs to users 


. Meanwhile, CryptoNets needs to change the structure of neural networks and retrains them with special non-linear activation functions such as the square function to suit the computational needs of the leveled homomorphic encryption. This results in a potentially negative effect on the accuracy of these models. More importantly, the computational cost in CryptoNets is prohibitively large. By contrast, SecureML 


conducts privacy-preserving machine learning via

secure multiparty computation (SMC), where data owners (clients) need to process, encrypt and/or secret-share their data among two non-colluding servers in a setup phase. Apparently, SecureML allows data owners to train various models on their joint data without revealing any information beyond the outcome, but at the cost of high computational and communication overhead, thereby decreasing interest in participation [lyu2017privacy, lyu2019fog].

Fig. 1: (a): Centralized topology. (b): Distributed topology. (c): Decentralized topology (Blockchain).

Distributed deep learning: Wainwright et al. [wainwright2012privacy] introduced the concept of distributed deep learning to protect the privacy of local training data. Multiple parties collaboratively train a model by sharing local model updates with each other. However, it requires the intermediate of a parameter server, as illustrated in Fig. 1(b), thus falling under the umbrella of server-based architecture.

Distributed learning is broadly studied in [zinkevich2010parallelized, shokri2015privacy, mcmahan2016federated]. The most related work is

Distributed Selective Stochastic Gradient Descent

(DSSGD) introduced by Shokri et al. [shokri2015privacy]. Instead of explicitly sharing training data, each party keeps its local neural network model private, while iteratively updates its model by integrating differentially-private gradients of other parties through a parameter server (PS). Communication cost is addressed by Selective Stochastic Gradient Descent (SSGD), where only a fraction (e.g., 1%-10%) of local model gradients that are above a threshold or the gradients with the biggest absolute values are shared in each round of communication. To achieve this goal, they exploit the fact that optimization algorithms, such as Stochastic Gradient Descent (SGD), can be parallelized and executed asynchronously. Each party computes local model gradients based on local training data, then a fraction of gradients are forwarded to an honest-but-curious PS, who is assumed to be curious in extracting individual information, but honest in operations. Each participant takes turns to upload and download a percentage of the most recent gradients to avoid getting stuck into local minima. In addition, the local model gradients shared with PS are perturbed with differential privacy. However, besides all the weaknesses inherent in server-based model, extra risks are pointed out:

Meaningless privacy: The privacy bound is given per-parameter, but the large number of parameters prevents the technique from providing a meaningful privacy guarantee. For example, Shokri reports about 92% accuracy on SVHN with per-parameter privacy budget , where corresponds to the number of the uploaded gradients of a model, which is over 300, 000 when the fraction of the uploaded gradients . Naively, this corresponds to a total for and for , which is actually meaningless for differential privacy as the total privacy loss per participant exceeds several thousands.

Privacy leakage: As shown in [aono2018privacy]

, local data information may be leaked to an honest-but-curious PS even releasing a small portion of gradients. In particular, a PS can extract individual data with non-negligible probability for neural network with only one neuron. Even for general neural networks with regularization, the released gradients can still reveal the label information.

Low quality services: Another assumption in the server-based framework is that all the parties are honest, neglecting the fact that some parties can take advantages by cheating the system. In reality, if a party turns out to be a cheater, or a low quality data contributor, it could easily sabotage the learning process by spoofing random samples or violate some of the privacy requirements by inferring information about the victim party’s private data, which the attacker is not supposed to know. Hitaj et al. [hitaj2017deep] described an active inference attack called generative adversarial networks (GAN) attack on deep neural networks in a collaborative setting. It exploits the real-time nature of the learning process that allows the adversarial party to train a GAN that generates prototypical samples of the targeted training set that was meant to be private and was intended to come from the same distribution as the training data. GAN attack makes the distributed setting even more undesirable, as in centralized learning only the server can violate the privacy of participating parties, but in distributed learning, any party may violate the privacy of other parties in the system, without involving the server [hitaj2017deep].

Similarly, federated learning leaves the sensitive training data on device rather than centralizing it, decoupling the ability to implement machine learning from the need to store the data in the cloud [mcmahan2016federated, bonawitz2017practical]. However, in the federated learning setting, a trusted curator (Google cloud server) collects parameters optimized by multiple devices whose data is typically non-independent and identically distributed (IID), unbalanced and massively distributed. The resulting model is then distributed back to all devices, ultimately converging to a joint representative model. To preserve privacy of individual model updates, Bonawitz et al. [bonawitz2017practical] proposed a secure aggregation protocol to protect the privacy of each user. The updates from individual users are securely aggregated by secure multiparty computation (SMC) to derive the weighted average of model gradients. Another more efficient method is to use differential privacy to guarantee user-level privacy [mcmahan2018learning], however, the default trusted Google server is entitled to see all users’ update clearly, aggregate individual updates and add noise to the aggregation, their setting is thus even weaker than DSSGD when the server is untrusted.

Decentralized deep learning: Kuo et al. [Kuo2016ModelChain] first proposed a decentralized machine learning model: ModelChain, which integrates Blockchain technology with privacy-preserving machine learning by incorporating the concept of boosting, i.e.

, samples that are more difficult to classify are more likely to improve the model. To be more specific, the global model is initialized with the local model with the lowest error to prevent error propagation, and in the follow-up epochs, the party with the highest error is chosen to be the winner party to update the model as it contains the most information to further improve the model, and thus should be assigned a higher priority to be chosen as the party to update the model. The update process is repeated until the consensus global model is derived, that is, when a party wins the update bid in two continuous epochs. However, their proposed scheme is reasonable only if all the participants are completely honest. Furthermore, all parties can get access to both the intermediate and consensus global model because each party publicly reveals its plain model.

All the above frameworks focus on how to learn a more accurate global model or multiple local models, with higher accuracy than individual standalone models, neglecting an important motivation for collaboration: fairness. Suppose a cheating party applies the following strategy: whenever it is the turn of the cheating party to update the gradients, it uploads random values with extremely small magnitude without disrupting overall learning process, then all parties including the cheating party can have access to the global model in all the existing deep learning frameworks, while other parties have no chance to detect this behaviour. Example II.1 demonstrates how the cheating party C gets the same global model using the above strategy.

Example II.1.

There are three participants A, B, and C participating in the server-based deep learning.

  • Honest party A: Has data and Model with accuracy .

  • Honest party B: Has data and Model with accuracy .

  • Cheating party C: Has no data and model to start with.

Suppose that whenever it is the turn of C to update the gradients, it uploads random values with extremely small magnitude without disrupting overall learning process. Finally, all these three participants are able to derive the same or similar accurate models.

To ensure fairness, one naive solution is to publish all gradient updates in the clear, such that this cheating behavior can be easily detected, however, it leads to a breach of privacy, thus not desirable. As a result, we are motivated to establish a privacy-preserving collaborative deep learning framework based on private Blockchain to ensure both privacy and fairness, as shown in Fig. 1(c).

Collaborative deep learning Centralized [gilad2016cryptonets, xie2014crypto, ohrimenko2016oblivious] Distributed [zinkevich2010parallelized, shokri2015privacy, mcmahan2016federated, mohassel2017secureml] Decentralized (our DPPDL)
Architecture Centralized as in Fig. 1(a) Distributed as in Fig. 1(b) Decentralized as in Fig. 1(c)
Global model Yes Depends Depends
Local models No Depends Yes
Fairness No No Yes
Cheating party detection No No Yes
TABLE I: Comparing different frameworks of collaborative deep learning.

Ii-B Blockchain Technology

Blockchain is a decentralized (i.e., a peer-to-peer, non-intermediated) system that is maintained by all the participants, called miners, in the system. The first Blockchain system is BitCoin [nakamoto2008bitcoin], where miners create blocks by solving cryptographic puzzles through the famous proof of work mechanism. In particular, a block mainly contains the hash value of the previous block in the chain, a set of transactions organized as a hash tree, a public key of the block creator, and a random nonce. The hash value of the previous block serves as a reference to specify the state of the block, i.e., it is a block created in the system right after the one being referenced. Each transaction represents a trade in the system, and the public key uniquely identifies the miner as a way for authentication. This public key is used to claim the mining reward. To spend a coin, one needs to sign the transaction by using the signing key associated with the public key that receives the coins. To create a block, a miner collects a set of valid transactions and creates its public key for this block.

Blockchain is well known for its transparency and robustness, namely, everyone is able to read and write in the Blockchain, and there is no single-point-of-failure as the Blockchain is maintained by all participants rather than a single or a few parties. Intuitively, the incremental characteristic of online deep learning makes it feasible for peer-to-peer networks like Blockchain. However, a reasonable approach to integrate Blockchain with privacy-preserving deep learning is yet to be devised.

Ii-C Differential Privacy

Differential privacy, as defined in Definition 1 [dwork2014algorithmic], trades off privacy and accuracy by perturbing the data in a way that is (i) computationally efficient, (ii) does not allow an attacker to recover the original data, and (iii) does not severely affect the utility.

Definition 1.

For scalars and , mechanism is said to preserve (approximate) -differential privacy if for all neighbouring pairs and measurable ,

Furthermore, is said to preserve (pure) -differential privacy if the condition holds for .

The formal definition of differential privacy has two parameters: privacy budget measures the incurred privacy leakage; bounds the probability that the privacy loss exceeds . The values of accumulate as the algorithm repeatedly accesses the private data [abadi2016deep].

Ii-D Homomorphic Encryption

Most previous work on homomorphic encryption considers homomorphic operations on ciphertexts encrypted under the same key [gentry2009fully]. These schemes do not directly apply in our case, since if all parties encrypt their data using the same public key, each party would not only be able to decrypt the aggregate, but also each individual’s values. By contrast, our cryptographic construction allows additive homomorphic operation over ciphertexts encrypted under different parties’ keystreams, details are provided in Section III-D1.

Iii DPPDL Framework

Iii-a Fairness Incentives

The fundamental principle behind fairness is that the party who invests more time and effort to collect high quality data should be rewarded more than the less contributive party. To preserve privacy, instead of publishing all the original data or model parameters, each participant publishes differentially private samples or encrypted gradients, and gains points as per the number of published samples or gradients. If any of the other participants needs to download the gradients, it is expected to spend some points. However, there is no limitation on the number of gradients that can be uploaded or downloaded, and they are not necessary to be equal. The only constraint is that a participant must have some points for downloading. A participant earns more points when uploading more samples or gradients, and these points can be used to download more gradients from others. This is the incentive for publishing more, as long as it is within the limit of privacy. Similarly, downloading more gradients consumes more points, so a participant might prefer to download as little as possible. DPPDL achieves fairness during download and upload processes as follows:

  • Download as per local credibility: Since one party might contribute differently to different parties, the credibility of this party might be different from the view of different parties, therefore, each party should keep a local credibility list by sorting all parties as per their local credibility in the descending order, known only by party . The higher the credibility of party in party ’s credibility list, the more likely party will download gradients from party , and consequently, more points will be rewarded to party .

  • Upload as per request and privacy level: Once one party receives download request for gradients, how many meaningful gradients will be uploaded depends on both the download request and its privacy level.

Iii-B Blockchain for DPPDL

Based on the above fairness incentives, we apply privacy-preserving online deep learning algorithm on the private Blockchain using Blockchain 2.0, which is only available to the participating parties. The advantage of using Blockchain is twofold:

  • Maintain modularity. Compared with the current server-based architecture, DPPDL inherits the peer-to-peer architecture of Blockchain, allowing each party to remain modular while interoperating with others. In addition, instead of ceding control to the central server, each party keeps full control of their own data, thus obeying the institutional policies. Moreover, Blockchain provides the native ability to automatically coordinate the join and departure of each party, further facilitating the independence and modularity of the participating parties.

  • Enhance privacy and security. DPPDL is meant to build a privacy-preserving interoperability platform. Specifically, Blockchain enhances security by avoiding single-point-of-failure.

There are two types of blocks in the Blockchain for DPPDL, namely, init block and operation block. An init block initializes benchmarking of the usefulness of each party’s training data, as a set of init transactions. An init transaction contains the initial points that the transaction creator earns, its contributed DPGAN samples, and its public key that will be used for authenticating future transactions. The genesis block (i.e., the first block) of the Blockchain is an init block, which gives the initial points and local credibility to each participant according to their relative contributions, as stated in Algorithm 1. If any party joins or adds new data during update, a new init block will also be created and added to the existing Blockchain. Meanwhile, Blockchain automatically deals with party-leaving situation. When a party leaves the private Blockchain network, other parties merely need to remove it from their local lists.

An operation block contains a set of transactions defining the UPLOAD operation and/or DOWNLOAD operation. All UPLOAD and DOWNLOAD transactions are signed by their creator using the private key associated with the public key recorded in the init transaction. An UPLOAD operation commits that a data owner has uploaded local model gradients to the party who sent a download request. A DOWNLOAD operation states that a participant is committed an order to request some local model updates from other participants. Upon receiving a DOWNLOAD transaction, Blockchain miners verify its signature, check if the requester has enough balance to download the number of requested gradients, and record successfully the verified transactions in an operation block. Once the DOWNLOAD transaction is recorded in the Blockchain, the requested local model gradients will be encrypted and uploaded by the owner to a public accessible storage, and re-encrypted using the recipient’s public key defined in the DOWNLOAD transaction.

In particular, the privacy of local model gradients is protected through a three-layer onion-style encryption scheme (see Section III-D). The first layer encrypts the local model gradients through our symmetric key based homomorphic encryption (Algorithm 3), which allows other parties to learn the aggregated gradients without revealing individual gradients, i.e., party obliviousness. The second and third layer present a standard hybrid encryption process: the second layer uses a freshly generated symmetric key to re-encrypt the first layer ciphertext, and the third layer encrypts with the requester party ’s public key . In this way, we minimize the required computational cost incurred by the asymmetric key based encryption. The commitment of the uploaded encrypted local model gradients (e.g., hash value of the ciphertext, as presented in Figure 3), will be included in the UPLOAD transaction.

In our private Blockchain, only the requester who pays could read the plaintext; others can verify that this transaction has happened, but cannot read. When a requester dishonestly blames a data uploader, the data uploader reveals the plaintext as an evidence. In this case, the requester will be forced to pay a fine that it deposits when filing a dishonest claim. Once an UPLOAD transaction is recorded in the Blockchain, the points will be automatically transferred from the requester to the uploader. An example of INITIALIZE and DOWNLOAD transaction stored by Blockchain are shown in Fig. 2 and Fig. 3, respectively. This Blockchain is either maintained in a permissionless and distributed manner, such as by utilizing Ethereum, or in a permissioned and decentralized manner, such as IBM’s Hyperledger Fabric.

Fig. 2: An example structure of the genesis block. It mainly contains two key components, one is a set of init transactions organized as leaves of a Merkle tree; and the other one is the consensus agreement reached by the participants through the underlying consensus protocol (e.g., PBFT or PoS), which is specific to the deployed Blockchain. The in the init transaction is a signature verification key of party .
Fig. 3: An example structure of the operation block. It mainly contains three key components, namely, the hash value Prev_Block_hash of the previous block, a set of UPLOAD/DOWNLOAD transactions organized as a Merkle tree, and the consensus agreement of this block. In particular, a Prev_Block_hash links the current block to the previous one, and the request in the UPLOAD transaction acts as a reference to the associated DOWNLOAD transaction. in the DOWNLOAD transaction is the public key that will be used in the last layer of our three-layer onion-style encryption scheme, is a unique request ID of this transaction and will be referenced in the corresponding UPLOAD transaction via DLD_request, and is the signature on this transaction. , , and refer to homomorphic encryption, symmetric key encryption, and public key encryption, respectively. Tables II presents a number of symbols for better readability.
Input: number of participating parties , C={1,…,n}
Output: local credibility and points of all parties
1: Pre-train aprior models: Each party trains standalone model and local DPGAN based on its local training data.
2: Privacy level initialization: During initialization, party randomly selects and releases artificial samples generated by local DPGAN to any party , privacy level is autonomously determined as , where is local training data size of party .
3: Local credibility initialization: Party labels the received artificial samples by its local model , then returns the predicted labels back to party . Meanwhile, party also labels its own DPGAN samples using . Afterwards, party applies majority voting to all the predicted labels, then initializes the local credibility of party as , where is the number of matches between majority labels and party ’s predicted labels, and is the number of DPGAN samples released by party . The detailed explanation is elaborated in Section III-C1.
4: Local credibility normalization:
if  then
     party reports party as low quality contributor
end if
5: Credible party set: If majority party report party as low quality, Blockchain removes party from the credible party set and all parties run step 4 again.
6: Points initialization to download gradients: .
Algorithm 1 Initial Benchmarking
Symbol Meaning
, local training data and local model of party
, points and gradients download budget of party
, local credibility and updated local credibility of party given by party
number of DPGAN samples released by party
number of meaningful gradients of party released to party
privacy level of party

gradient vector of party

masked gradient vector of party by filling the remaining gradients with 0
parameter of party at previous epoch
updated parameter of party at current epoch
number of participating parties
lower bound of the credibility threshold
credible party set with local credibility above agreed by parties
number of matches between majority labels and party ’s predicted labels
party ’s key pair for signing and verification, respectively
party ’s keystream used in the first layer of three-layer onion-style encryption
fresh symmetric encryption key used in the second layer of three-layer onion-style encryption
party ’s key pair for decryption and encryption in the third layer of three-layer onion-style encryption
homomorphic encryption
symmetric key encryption
public key encryption
TABLE II: Table of symbols.

Iii-C Initial Benchmarking

To ensure fairness, we develop an initial benchmarking algorithm to benchmark the quality of local training data of each participant via mutual evaluation before collaborative learning starts. Otherwise, we cannot prevent a parasitic participant like party C, as shown in Example II.1, gaining access to the best individual model (say ) right at the initialization. For this benchmarking, our proposed solution is as follows: each participant can train a DPGAN based on local training data to generate artificial samples, which will be distributed and used for secondary analysis. However, these generated samples do not

disclose the true sensitive image instances, as well as the true distribution of data, but only a few implicit density estimation within a modest privacy budget used in DPGAN. Each participant publishes individually generated artificial samples as per individual privacy level without releasing labels. All the other participants produce predictions for the received artificial samples using their pre-trained standalone models and send the predicted labels back to the party who generated these samples. Based on the state-of-the-art DPGAN research, the following aims are considered:

  • To get aprior information about individual models before collaborative learning starts. If a participant does not have reasonable amount of training data to produce a decent model, it will perform poorly in the initial evaluation of DPGAN samples and be ranked low by other participants, therefore, other participants will be cautious in sharing gradients with such a participant.

  • To get a rough estimate of data distribution of other participants. Two parties can mutually benefit only if their data distributions are different but have some degree of overlap. Suppose that two participants A and B have published almost identical artificial samples, it means that their training data distribution are almost identical. In this case, the updates from B are unlikely to increase the accuracy of model A and vice versa. Consequently, their models are unlikely to improve significantly by incorporating updates from each other. Therefore, during the subsequent epochs, A and B should mutually avoid downloading updates from each other. Other participants can choose to download updates from either A or B but not both. On the other hand, suppose that two participants A and B have completely different data distributions, similarly, the updates from B are unlikely to increase the accuracy of model A and vice versa. Thus during the subsequent epochs, A and B should also try to avoid downloading updates from each other. Furthermore, suppose that A’s data distribution is different from that of all the other participants, all these participants should try to avoid A. This automatically takes care of the scenario where a honest participant publishes some gradients, while all the other honest participants report very low credibility. In this case, the data distribution of the publisher is completely different from that of the other participants, hence it is still reasonable to reduce the credibility of the publisher, because other participants are anyway unlikely to gain much from its published gradients.

Next, we detail the steps in achieving the main objectives in Algorithm 1: local credibility initialization, and privacy level and points initialization.

Iii-C1 Local Credibility Initialization

For local credibility initialization, each party compares the majority voting of all the combined labels with a particular party’s predicted labels to evaluate the effect of this party. It relies on the fact that the majority voting of all the combined labels reflects the outcome of most of the parties, while the predicted labels of party only reflects the outcome of party . For example, in the case of party initializing local credibility list for other parties, party broadcasts its DPGAN samples to other parties, who label these samples using their pre-trained standalone models, and send the corresponding predicted labels back to party . Furthermore, party also labels its own artificial samples using its pre-trained standalone model, then combines all parties’ predicted labels as a label matrix with total columns, where each column corresponds to one party’s predicted labels. Party finally initializes the local credibility of party as follows:


where is the number of matches between the majority labels and party ’s predicted labels, and is the number of DPGAN samples released by party . Afterwards, party normalizes within [0,1]. If most of the parties report that the local credibility of one participant is lower than the threshold agreed by 2/3 parties, implying a potentially low quality contributor, it will be banned from the local credibility lists of all parties. For all the experiments, we set the local credibility threshold as , where is the number of parties. In the update process, party is more likely to download gradients from more credible participants, while download less, even ignore those published by less credible participants.

Iii-C2 Privacy Level and Points Initialization

Based on the number of artificial samples that party publishes at the beginning, privacy level of party is autonomously determined that it is comfortable with, which can be quantified as , where is the local training data of party . The more private party tends to release less samples, while the less private party is comfortable with releasing more samples. Similarly, during collaborative learning process, more private party would prefer to release less gradients. Point is initialized as follows:


where is the privacy level of party , is the number of model parameters, and is the number of parties. The gained points from initial benchmarking will be used to download gradients in the follow-up collaborative learning process, and how many gradients will be downloaded is dependent on both the local credibility and privacy level of the requested party.

Input: , , , , , , ,
Output: updated points , , parameters , and local credibility c_i^j’
1: Points update: In each epoch, party aims to download total gradients from all parties in , while party can at most provide gradients, one point is consumed/rewarded for each download and upload. Each party updates local model parameters based on the gradients of party as follows:
if  then
     for  do
           meaningful gradients are chosen from according to “largest values" criterion: sort gradients in and choose of them, starting from the largest.
     end for
end if
2: Three-layer onion-style encryption: party first masks the remaining gradients with 0, such that the shared gradients is of the length of the model parameters for the purpose of additive homomorphic encryption. Party then follows Algorithm 3 to encrypt the masked gradients with its keystream as , and re-encrypts the encrypted gradients with a fresh symmetric encryption key as , the symmetric encryption key of the second layer is encrypted in the third layer by the receiver party ’s public key as . Finally, the two-layer encrypted gradients and the encrypted fresh symmetric encryption key are sent to party ;
3: Parameter update: party uses the paired secret key to decrypt the received encrypted fresh symmetric encryption key as , then uses to decrypt two-layer encrypted gradients as , finally decrypts the sum of all the received gradients using homomorphic property and updates local parameters by integrating all its plain gradients as: , where is party ’s local parameters at previous epoch.
4: Local credibility update: party randomly selects and releases artificial samples to any party for labelling, mutual evaluation is repeated by following Step 3 of Algorithm 1 to calculate local credibility of party at current epoch as . Party updates local credibility of party by integrating historical credibility as:
where is the local credibility of party at previous epoch.
5: Local credibility normalization:
if  then
     party reports party as low quality contributor
end if
6: Credible party set: If majority party report party as low quality, Blockchain removes party from credible party set and all parties run Step 5 again.
Algorithm 2 Privacy-Preserving Collaborative Deep Learning

Differentially Private GAN (DPGAN): During initial benchmarking, although each party only releases a small amount of unlabeled samples as per individual privacy level, it may still implicitly disclose privacy of the training data. Meanwhile, the practice of generating samples under differential privacy with generative adversarial network (GAN) offers a technical solution for those who wish to share data to the challenge of privacy. Therefore, we are inspired to train a Differentially Private GAN (DPGAN) by adding tailored noise to gradients during DPGAN learning [zhang2018differentially] at each party. In the context of a GAN, the discriminator is the only component that accesses the private real data. Therefore, we only need to train the discriminator under differential privacy. The differential privacy guarantee of the entire GAN directly follows because the computations of the generator are simply post-processing from the discriminator. The main idea follows the post-processing property of differential privacy [dwork2014algorithmic], as stated in Lemma 1. To counter the stability and scalability issues of training DPGAN models, we apply multi-fold optimization strategies, including weight clustering, adaptive clipping, and warm starting, which significantly improve both training stability and utility [zhang2018differentially]. Unlike PATE framework in [papernot2016semi], whose privacy loss is proportional to the amount of data needed to be labeled in public test data, differentially private generator can generate infinite number of samples for the intended analysis, while rigorously guarantee -differential privacy of training data. Without loss of generality, we exemplify DPGAN in the context of the improved WGAN framework [arjovsky2017wasserstein]. As demonstrated by Zhang et al. [zhang2018differentially], DPGAN is able to synthesize data with inception scores fairly close to the real data and samples generated by regular GANs without privacy protection.

Lemma 1.

Let algorithm be a randomized algorithm that is ()-differentially private. Let be an arbitrary randomized mapping. Then

is ()-differentially private.

Meanwhile, it is well-known that larger amount of training data causes less privacy loss, and allows for more iterations within a moderate privacy budget [abadi2016deep]

. Due to the scarcity of training data of each party, data augmentation is applied to expand local data size of each party to 100 times, which allows DPGAN to generate realistic samples within a moderate privacy budget. In particular, we augment original data with rotation range of 1 and width shift range and height shift range of 0.01. In our experiment, we use moments accountant described in 

[abadi2016deep] to track the spent privacy over the course of training. Our DPGAN is able to generate realistic samples with and , as shown in Fig. 4. Note that each party can individually train DPGAN and generate massive DPGAN samples offline without affecting collaboration.

Fig. 4: Generated samples with using DPGAN on the augmented 60000 MNIST examples of one party who owns 600 original MNIST examples.

Iii-D Privacy-Preserving Collaborative Deep Learning

Algorithm 2 summarizes the steps for our proposed privacy-preserving collaborative deep learning, including how to update points as per upload/download, how to preserve privacy of individual model updates using three-layer onion-style encryption, followed by parameter and local credibility update, and credible party set maintenance. We discuss the relevant details for parameter update and local credibility update as follows.

Iii-D1 Parameter Update with Homomorphic Encryption

To ensure privacy of the shared gradients and facilitate gradients aggregation during the collaborative learning process, we use additive homomorphic encryption, such that each party can only decrypt the sum of all the received encrypted gradients, but cannot access any of them. Specifically, Vernam cipher or one-time pad (OTP) has been mathematically proved to be completely secure, which cannot be broken given enough ciphertext and time. Therefore, we use simple and provably secure OTP for additively homomorphic encryption that allows efficient aggregation of encrypted data 

[castelluccia2005efficient, lyu2018ppfa]. The main idea of forming the ciphertext is to combine the keystream with the plaintext digits. Meanwhile, rather than XOR operation typically found in stream ciphers, which is unsecured under the frequency analysis attacks, our encryption scheme uses modular addition (+), and is hence very efficient [castelluccia2005efficient]. The security relies on two important features: (1) the keystream changes from one message to another; and (2) all the operations are performed modulo a large integer  [castelluccia2005efficient].

The detailed procedure for homomorphic encryption is presented in Algorithm 3. In practice, if p = max(), is derived as . All computations in the remainder of this paper are modulo unless otherwise noted. However, all the original floating-point values need to be mapped to the discrete domain of integers using Scaling, Rounding, Unscaling (SRU) algorithm [lyu2018ppfa]. A pseudorandom keystream can be generated by a secure pseudo random function (PRF) by implementing a secure stream cipher, such as Trivium, keyed with each party’s keystream and a unique message ID. For encryption purpose, the secret keys are pre-computed through a trusted setup, which can be performed by a trusted dealer or through a standard SMC protocol. For example, a trusted key managing authority can generate these keystreams in each epoch of information exchange, but the generated keystreams cannot be used more than once. The trusted setup generates non-zero random shares of 0: , such that each participant obtains a keystream for . It should be noted that if Blockchain removes party from the credible party set , the trusted setup should be restarted among parties in .

1: A trusted dealer randomly generates keystreams: , such that (mod )= 0, where is a large integer.
2: Party obtains keystream .
Enc(, )
1: Represent message as integer .
2: Let be a randomly generated keystream, where .
3: Compute .
Dec(, )
1: .
1: Let , where .
2: Party uses to decrypt the aggregation of other parties as follows: .
Algorithm 3 Homomorphic Encryption Scheme

Clearly, in this setting all shares () will be needed to decrypt any ciphertext, and no party is able to decrypt the ciphertext on his own. In this way, privacy is preserved without compromising accuracy when the selected gradients are shared among all the participants, as manifested in [aono2018privacy]. Model parameter of party is updated as per gradients-encrypted SGD as follows:

where is the local parameters of party at previous epoch, is the masked gradient vector of party with only meaningful gradients, and correspond to encryption and decryption operations in Algorithm 3. The second equality follows the additively homomorphic property of encryption. By the additively homomorphic property of , participant can get the updated correctly after decryption, without having access to either or . Hence, DPPDL ensures party obliviousness by capturing the following security notions:

  • Each participant knows nothing but the sum in each round of communication, and cannot infer any information about other participants’ data.

  • If several participants form a coalition against the remaining participants, or if a subset of the encrypted data has been leaked, then each participant can inevitably learn the sum of the remaining participants. In this case, each participant learns no additional information about the remaining participants’ data.

Iii-D2 Three-layer Onion-style Encryption

However, as all parties need to store different encrypted gradients that are meant to be sent to different parties on Blockchain for commitment, all the encrypted gradients are also accessible to all parties. Applying public-key encryption on top of homomorphic encryption for authentication [lyu2018ppfa] could counter this problem, however, as the released gradient vector is high-dimensional, encrypting gradient vector is both computation and communication expensive. Therefore, we propose a three-layer onion-style encryption scheme. In more details, the first layer protects local model gradients through homomorphic encryption, by using symmetric key keystream , as presented in Algorithm 3. The second layer and the third layer are classic hybrid encryption, as used in OpenPGP [callas2007openpgp] for instance. In particular, in the second layer, a fresh symmetric encryption key will be generated and used to re-encrypt the ciphertext of the first layer, and then the fresh symmetric key is encrypted by using the receiver’s public key in the third layer. In this way, the encryption of large-scale data becomes very effective, and the receiver could be authenticated as well: only the receiver who has the corresponding secret key paired with the public key can decrypt the two-layer encrypted gradients committed on the Blockchain.

Iii-D3 Local Credibility Update

Instead of using the standalone models as in the local credibility initialization, during collaborative learning process, each party randomly selects and shares a subset of DPGAN samples as per individual privacy level at each epoch of training, then calculates the local credibility of other parties based on the returned labels, which are evaluated using their updated local models at current epoch. The mutual evaluation procedure follows Step 3 of Algorithm 1. Finally, local credibility of each party is updated by integrating its historical local credibility as per Step 4 of Algorithm 2. In this way, local credibility of each party can be correspondingly updated, reflecting more accurately how much one party contributes to another party in collaborative learning.

Iii-E Quantification of Fairness

Fairness of our framework can be quantified by the correlation coefficient between individual contribution (X axis) and model test accuracy (Y axis). The X axis represents the contribution of each party, characterizing by their privacy level or the size of training data, as the party who is less private or has more data empirically contributes more. Y axis refers to the final test accuracy, which measures the performance of individual model after collaboration, and is expected to be positively correlated with X axis to deliver good fairness. Conversely, negative coefficient implies bad fairness. Equation 3 formally quantifies fairness:


where and are the sample means of (privacy level or the size of training data of different parties) and (test accuracy of individual models), and

are the corrected standard deviations. The range of fairness is within [-1,1], with higher values implying better fairness.

Iv Performance Evaluation

Iv-a Datasets

We implement experiments on two benchmark image datasets. The first is the MNIST dataset [lecun1998gradient] for handwritten digit recognition consisting of 60,000 training examples and 10,000 test examples. Each example is a 32x32 gray-level image, with digits locating at the center of the image. The second is the SVHN dataset [netzer2011reading] of house numbers obtained from Google’s street view images, which contains 600,000 training examples, from which we use 100,000 for training and 10,000 for testing. Each example is a 32x32 centered image with three channels (RGB). SVHN is more challenging as most of the images are noisy, and contain distractors at the sides. The size of the input layer of neural networks for MNIST and SVHN are 1024 and 3072, respectively. The objective is to classify the input as one of 10 possible digits within [“0”-“9”], thus the size of the output layer is 10. We normalize the training examples by subtracting the average and dividing by the standard deviation of training examples.

Iv-B SGD Frameworks

We demonstrate the effectiveness of our proposed DPPDL by comparison with the following three frameworks.

Centralized SGD allows a trusted server to have access to all participants’ data in the clear, and train a global model on the combined data using standard SGD. Hence, it is a privacy-violating framework.

Standalone SGD assumes parties train standalone models on local training data without any collaboration. This framework delivers maximum privacy, but minimum utility, because each participant is susceptible to falling into local optima when training alone.

Distributed Selective SGD (DSSGD) enables parties to train independently and concurrently, and chooses a fraction of parameters to be uploaded at each iteration. DSSGD can achieve even higher accuracy than the centralized SGD because updating only a small fraction of parameters at each epoch acts as a regularization technique to avoid overfitting by preventing the neural network weights from jointly "remembering" the training data. As DSSGD with round robin parameter exchange protocol results in the highest accuracy [shokri2015privacy], we use round robin protocol for DSSGD, where participants run SSGD sequentially, each downloads a fraction of the most updated parameters from the server, runs local training, and uploads selected gradients; the next party follows in the fixed order. Gradients are uploaded according to the “largest values" criterion. It should be noted that in round robin protocol, the last party who downloads gradients always gains much more information than previous parties, which is obviously unfair.

Iv-C Experimental Setup and Results

For implementation, we use two popular neural network architectures:

multi-layer perceptron

(MLP) and convolutional neural network (CNN) as in [shokri2015privacy]. For model training, we set the learning rate as 0.001, learning rate decay as 1e-7, and batch size as 1. In addition, to reduce the impact of different initializations and avoid non-convergence, each party is initialized with the same parameter , then local training is run on individual training data to update local model parameter . To speed up convergence, we let each party individually train 10 epochs before collaborative learning starts. Next, we investigate three realistic IID settings as follows:

Same privacy level: in the first case, privacy level of each party is set as , where each party releases meaningful gradients during collaboration. For each participant, we randomly sample of the entire database as local training data, i.e., examples for MNIST and examples for SVHN;

Different privacy level: in the second case, privacy level of each party is randomly sampled from , and parties release meaningful gradients as per individual privacy level during collaboration. For each participant, we randomly sample of the entire database as local training data as above.

Imbalanced partition: in the third case, we simulate the case where different parties have different data size. In particular, for MNIST dataset, we randomly partition total {2400, 9000, 18000, 30000} examples among {4,15,30,50} parties respectively. Similarly, for SVHN dataset, total {4000, 15000, 30000, 50000} examples are randomly partitioned among {4,15,30,50} parties respectively. The privacy level of each party is fixed to .

Different privacy Imbalanced partition
P4 -0.68 0.30 0.89 0.84 -0.97 0.28 0.92 0.96
P15 0.20 -0.15 0.76 0.79 0.03 -0.07 0.90 0.83
P30 -0.02 0.02 0.79 0.84 0.13 0.01 0.75 0.63
P50 -0.16 -0.05 0.75 0.67 0.14 0.07 0.72 0.57
TABLE III: DSSGD and DPPDL fairness over MNIST dataset, with different architectures, different party numbers (P-) and different settings as described in Section IV-C.
Different privacy Imbalanced partition
P4 0.27 0.26 0.75 0.73 0.38 0.20 0.92 0.91
P15 0.16 0.19 0.74 0.71 -0.13 0.36 0.80 0.88
P30 -0.14 0.12 0.58 0.65 0.04 -0.27 0.61 0.78
P50 -0.25 -0.37 0.67 0.66 -0.23 0.15 0.68 0.69
TABLE IV: DSSGD and DPPDL fairness over SVHN dataset, with different architectures, different party numbers (P-) and different settings as described in Section IV-C.

Table III and Table IV list the calculated fairness of DSSGD and DPPDL over MNIST and SVHN datasets, with different architectures, different party numbers and different settings, as detailed in Section IV-C. These results are averaged over five trails. As is evidenced by the high positive values of fairness, with most of them above 0.5, DPPDL achieves reasonably good fairness, confirming the intuition behind fairness: the party who is less private and has more training data delivers higher accuracy. In contrast, DSSGD exhibits bad fairness with significantly lower values than that of DPPDL in all cases, and even negative values in some cases, manifesting the lack of fairness in DSSGD. This is because in DSSGD, all the participating parties can derive similarly well models, no matter how much one party contributes.

Fig. 5: MLP convergence over MNIST for different frameworks and different number of parties.

For accuracy comparison, we implement DPPDL using synchronous SGD protocol, and set the privacy level of each party as (), and for DSSGD using round robin protocol, upload rate is set as (). For MNIST dataset using MLP architecture, Fig. 5 demonstrates that DPPDL does not sacrifice model utility by a considerable margin compared with DSSGD without differential privacy and centralized SGD for different number of parties, and consistently delivers better accuracy than the standalone SGD. Beyond 100 epochs, we can potentially achieve slightly higher accuracy.

Framework MLP CNN
P4 P15 P30 P50 P4 P15 P30 P50
Centralized 91.68 95.17 96.28 96.85 96.58 98.19 98.52 98.58
DSSGD (round robin w/o DP) 91.67 95.17 96.33 97.35 96.25 98.04 98.63 98.83
Standalone 87.39 88.06 88.64 88.80 93.81 93.46 94.04 94.05
DPPDL (same privacy) 90.13 94.42 94.88 95.57 95.93 97.19 97.62 98.07
DPPDL (different privacy) 91.92 95.70 95.94 96.23 95.50 97.34 97.84 98.14
DPPDL (imbalanced partition) 90.75 94.37 94.75 95.21 95.23 97.50 97.82 98.22
TABLE V: Maximum accuracy [%] over MNIST of varying party number settings, achieved by Centralized, Standalone, DSSGD ( without DP) and DPPDL (three settings as described in Section IV-C) frameworks using MLP and CNN architectures. P- indicates there are parties in the experiments.
Framework MLP CNN
P4 P15 P30 P50 P4 P15 P30 P50
Centralized 75.40 83.08 85.77 87.15 90.50 91.88 93.42 95.44
DSSGD (round robin w/o DP) 78.34 85.49 87.64 89.21 91.78 93.03 95.75 96.19
Standalone 57.85 58.77 57.90 59.18 80.24 80.74 81.29 81.60
DPPDL (same privacy) 73.74 82.55 84.86 86.51 90.07 91.18 92.74 94.83
DPPDL (different privacy) 74.16 82.67 85.25 86.57 89.91 91.15 92.59 95.18
DPPDL (imbalanced partition) 74.57 82.95 85.37 86.34 89.53 91.03 93.13 94.89
TABLE VI: Maximum accuracy [%] over SVHN of varying party number settings, achieved by Centralized, Standalone, DSSGD ( without DP) and DPPDL (three settings as described in Section IV-C) frameworks using MLP and CNN architectures.

Table V provides the maximum accuracy of MNIST dataset over {4,15,30,50} parties using three baseline frameworks, and our proposed DPPDL in three realistic settings. Similarly, Table VI provides accuracy results for SVHN dataset. For both MNIST and SVHN datasets using CNN and MLP architectures, we show the worst accuracy for standalone SGD (minimum utility, maximum privacy). In particular, DPPDL obtains comparable accuracy to both the centralized SGD and DSSGD without differential privacy, and always achieves higher test accuracy than standalone SGD. For example, as shown in Table V, for MNIST dataset over 50 parties with CNN model, our DPPDL achieves 98.07%-98.22% test accuracy under different settings, which is higher than the standalone SGD 94.05%, and comparable to 98.83% of DSSGD without differential privacy, and 98.58% of centralized SGD.

The above fairness results in Table III and Table IV, and accuracy results in Table V and Table VI demonstrate that our proposed framework DPPDL achieves reasonable fairness and strong privacy, at the expense of a tiny decrease in model utility.

V Discussion

Data Augmentation and Collaboration. To facilitate credibility initialization, we apply data augmentation to expand local data size to help DPGAN generate reliable samples within a moderate privacy budget. However, data augmentation is intended to increase the amount of training data using information inherent in local training data, and thus improve the generalizability of local model, while not helpful for generalizing to unseen data. In other words, it cannot represent global distribution, and it is also the reason why parties still need collaboration for better utility even after data augmentation. By using DPGAN, it not only preserves privacy of the original data, but also preserves privacy of the augmented data that are similar to the original data.

Fairness and Privacy. With three-layer onion-style encryption, privacy is better preserved without compromising utility. We ensure fairness from two ways: (i) during initial benchmarking, DPGAN samples are generated based on local training data to initialize local credibility of participants through mutual evaluation using standalone models; and (ii) during collaborative learning process, each party randomly selects and shares a subset of DPGAN samples as per individual privacy level at each epoch of training, then updates the local credibility of other participants by evaluating the received DPGAN samples using its local model at current epoch. Therefore, local credibility of each party keeps changing, reflecting more accurate contribution of other participants and thus possessing better fairness. Differentially private training of deep models provides another alternative solution by releasing gradients after each epoch of training, thus enabling each party to verify the claims of other parties and update local credibility as per the received gradients during collaborative learning process. One obstacle is that differentially private models may significantly reduce utility for small values.

Quality Control. In our framework, we control the quality of training data through the initialization phase. Considering a party who has limited training data and local model with poor quality, or a party who might even have no data and local model at all, like the cheating party C in Example II.1, during initialization, this party may, for example, try to randomly sample from 10 classes as predicted labels for the received DPGAN samples, then release them to the corresponding party who publishes these DPGAN samples and requests labels. When the publisher receives the returned random labels from this party and detects that most of them are not aligned with the majority voting, i.e., , then this party will be reported as “low-quality contributor". If the majority report one party as "low-quality contributor", then Blockchain rules out this party from the credible party set, and all parties would terminate the collaboration with this party. In this way, such party is isolated from the beginning, and the remaining parties can continue collaboration. Even though the party might succeed in initialization somehow, the credibility of the low-quality contributor is significantly lower compared with the other honest parties. To further detect and isolate the low-quality contributor during the collaborative learning process, we repeat mutual evaluation at each epoch of collaborative learning by using samples generated at the initialization phase. Each party randomly selects and shares a subset of DPGAN samples as per individual privacy level at each epoch of collaborative learning, then updates the local credibility of other alive participants by comparing the majority labels with the received labels output by the local models of other participants at current epoch of training. Hence, the chance of the survival of the low-quality contributor is significantly reduced. Note that the threshold of acceptable lower bound of the quality can be agreed by the system according to its special need.

Vi Conclusion and Future Work

This paper proposes DPPDL, a Blockchain-empowered decentralized privacy-preserving deep learning framework. Our enhanced system shows the following properties: (1) it resolves the issues of single-point-of-failure, single-point-of-breach, and privacy leakage in the existing server-based frameworks; (2) it makes the first investigation on the research problem of fairness in collaborative deep learning, by introducing a notion of local credibility and transaction points, which are initialized by initial benchmarking, and updated during privacy-preserving collaborative deep learning; (3) it combines Differentially Private GAN (DPGAN) and a three-layer onion-style encryption scheme to guarantee accuracy, privacy, and system robustness; (4) it provides a viable solution to detect and isolate the cheating party in the system. The experimental results demonstrate that our DPPDL framework achieves comparable accuracy to both the centralized and distributed selective SGD framework without differential privacy, and consistently delivers better results than standalone framework, confirming the applicability of our proposed framework. A number of avenues for further work are attractive. In particular, we would like to study different malicious behaviours and explore real-world applications, such as fair and private collaboration among financial or biomedical institutions. We also expect to deploy our system on hardware in the near future.


Lingjuan Lyu is supported by an IBM PhD Fellowship in AI and Blockchain. The authors would like to thank Prof. Benjamin Rubinstein, Dr. Kumar Bhaskaran, and Prof. Marimuthu Palaniswami for their insightful discussions and supports on this work.