Corella
A Private Multi Server Learning Approach based on Correlated Queries
view repo
The emerging applications of machine learning algorithms on mobile devices motivate us to offload the computation tasks of training a model or deploying a trained one to the cloud. One of the major challenges in this setup is to guarantee the privacy of the client's data. Various methods have been proposed to protect privacy in the literature. Those include (i) adding noise to the client data, which reduces the accuracy of the result, (ii) using secure multiparty computation, which requires significant communication among the computing nodes or with the client, (iii) relying on homomorphic encryption methods, which significantly increases computation load. In this paper, we propose an alternative approach to protect the privacy of user data. The proposed scheme relies on a cluster of servers where at most T of them for some integer T, may collude, that each running a deep neural network. Each server is fed with the client data, added with a strong noise. This makes the information leakage to each server informationtheoretically negligible. On the other hand, the added noises for different servers are correlated. This correlation among queries allows the system to be trained such that the client can recover the final result with high accuracy, by combining the outputs of the servers, with minor computation efforts. Simulation results for various datasets demonstrate the accuracy of the proposed approach.
READ FULL TEXT VIEW PDF
Performing computations while maintaining privacy is an important proble...
read it
Outsourcing computation has gained significant popularity in recent year...
read it
Recently, deep learning as a service (DLaaS) has emerged as a promising ...
read it
This paper explores the privacy of cloud outsourced Model Predictive Con...
read it
The problem of securely outsourcing computation to cloud servers has
att...
read it
The Gamer's Private Network (GPN) is a client/server technology created ...
read it
Data have often to be moved between servers and clients during the infer...
read it
A Private Multi Server Learning Approach based on Correlated Queries
With the expansion of machine leaning (ML) applications, dealing with high dimension datasets and models, particularly for the low resource devices (e.g., mobile units) it is inevitable to offload heavy computation and storage tasks to the cloud servers. This raises a list of challenges such as communication overhead, delay, convergence rate, operation cost, etc. One the major concerns, which is becoming increasingly important, is the maintaining the privacy of the used datasets, either the training dataset or the user dataset, such that the leaked information to the cloud servers is under control.
The information leakage of the training dataset may occur either during the training phase or from the trained model. In the training phase privacy, a data owner that offload the the task of training to some untrusted servers is concerned about the privacy of his sampled data. In the trained model privacy, the concern is to prevent the trained model from exposing information about the training dataset. On the other hand, in the user data privacy, a client wishes to employ some servers to run an already trained model on his dataset, while preserving privacy of his individual dataset versus the servers.
There are various techniques to provide privacy in machine learning scenarios, with three major categories of randomization and adding noise, which sacrifices accuracy, Secure Multiparty Computation (MPC), which requires heavy communication load or large number of servers, and Homomorphic Encryption (HE), which costs the system heavy computation overhead.
The main objective of this paper is to develop an alternative approach, which avoids heavy communication and computation overhead of MPC and HE based schemes, and at the same time guarantees high level of privacy, without severely sacrificing accuracy. The proposed approach is based on sending noisy correlated queries to some servers, where the network is trained to eliminate the effect of noise with minor computation load at the client.
Here we briefly review three major categories of providing privacy for ML applications.
Randomization and Adding Noise: Randomization and adding noise to the client data or the ML model will confuse the servers and reduces the amount of information leakage, at the cost of sacrificing the accuracy of the result.
In (Fredrikson et al., 2015; Shokri et al., 2017), it is shown that parameters of a trained model can leak sensitive information about the training dataset. Differential privacy based approaches (Dwork, 2006; Dwork et al., 2006; Dwork & Roth, 2014) can be utilized to provide privacy to prevent this leakage. A randomized algorithm is differentially private if its output distributions for any two input adjacent datasets (two datasets that differ only in one element) are close enough. Adding noise to a deterministic function is a common method to provide differential privacy, which is also used in ML algorithms. (Dwork et al., 2014)
proposes a differentially private algorithm for principal component analysis (PCA) as a simple statistic algorithm in ML. For complex statistic algorithms (e.g., Deep Neural Networks),
(Abadi et al., 2016; Papernot et al., 2017)add noise to the neural network output in order to make the deep learning algorithm differentially private. But the privacyaccuracy tradeoff in differential privacy
(Alvim et al., 2011) bounds the scale of added noise to the model output and thus limits the privacy preserving.In distributed stochastic gradient descent (SGD), used in many distributed machine learning algorithms, including federated learning framework
(McMahan & Ramage, 2017; Bonawitz et al., 2019), privacy of the dataset can be violated by the exchanged messages. In those scenarios, each client uploads its local gradient vector with respect to its sampled data to a central server. The gradient vectors is used by the central server to update its global model, which is shared with the servers. This is shown in
(Hitaj et al., 2017) that an attacker can generate prototypical samples the client dataset by only having access to the updated model parameters. A remedy for this leakage is to use differential privacy approaches (Shokri & Shmatikov, 2015; Abadi et al., 2016; Jayaraman et al., 2018), but the privacyaccuracy tradeoff in this approaches limits the scale of noise and violates the privacy.Kanonymity (Samarati & Sweeney, 1998) is another privacy preserving framework. The anonymity in a dataset means that one cannot identify a data item among data (e.g., by removing some digits of zip codes) (Sweeney, 2002; Bayardo & Agrawal, 2005). But
anonymity framework may not guarantee a good privacy, particularly for a highdimensional data (as it happened for the Netflix Prize dataset
(Narayanan & Shmatikov, 2008)).As an alternative approach, authors in (Osia et al., 2018) perturb the data by passing it through a module (function) which can be trained to protect some sensitive attributes of the data, while preserves the accuracy of the learning as much as possible.
Secure Multiparty Computation: This approach exploits the existence of a set of noncolluding servers to guarantee informationtheoretically privacy in some classes of the computation tasks like the polynomial functions (Yao, 1982; BenOr et al., 1988; Shamir, 1979). This approach can be applied to execute an ML algorithm (Gascón et al., 2017; Dahl et al., 2018; Chen et al., 2019). The shortcoming of this solution is that it costs the network huge communication overhead. To reduce this communication overhead, one solution is to judiciously approximate the nonlinear functions such that the number of interaction among the servers be reduced, eventhough not completely eliminated (Mohassel & Zhang, 2017). Another solution is to rely on Lagrange coding to develop an MPC with no communication among servers (So et al., 2019). That approach can also exploit the gain of parallel computation. The major disadvantage of (So et al., 2019) is that a target accuracy is achieved at the cost of employing more servers.
Homomorphic Encryption: Homomorphic Encryption (Gentry & Boneh, 2009) is another cryptography tool that can be applied for ML applications (Graepel et al., 2012; Hesamifard et al., 2017; Li et al., 2018; Wang et al., 2018). It creates a cryptographically secure framework between the client and the servers, that allows the untrusted servers to process the encrypted data directly. However computation overhead of HE schemes is extremely high. This disadvantage is directly reflected in the time needed to train a model or use a trained one as reported in (Han et al., 2019; GiladBachrach et al., 2016). On the other hand, this framework does not guarantee informationtheoretic privacy assuming that the computation resource of the adversary is limited.
In this paper, we propose Corella as a privacy preserving approach for offloading ML algorithms based on sending correlated queries to the multiple servers. These correlated queries are generated by adding strong correlated noise terms to the user data. The system is trained such that the user can recover the result by combining the (correlated) answers received from the servers, while the data is private from each server due to the strong added noise. Each server runs a regular machine learning model (say deep neural network) with no computation overhead. In addition, other than uploading the data to the servers and downloading the results, there is no communication among the servers or between the servers and the user. Thus the proposed scheme provides information theoretical privacy while maintaining the communication and computation costs affordable. We apply the proposed approach for user data privacy
problem in a supervised learning setup. We consider a client with limited computation and storage resources who wishes to label his individual data. Thus for processing, it relies on a cluster of
semihonest servers, where up to of them may collude. It means the servers are honest in protocol compliance, but curious to learn about the client data, and at most an arbitrary subset of out of of them may collude to learn it. The objective is to design an algorithm for this setup with reasonable accuracy, while the leakage of the client data to any colluding servers is information theoretically small.In summary, Corella offers the following desirable features:
Thanks to the strong added noise to the user data, information leakage to each server is negligible (i.e., it is informationtheoretically private).
The correlation among the queries enables the system to be trained such that the user can cancel the effect of the noise by combing the correlated answers and recover the final result with high accuracy.
There is no communication among the servers. Moreover, computation load per server is reasonable. In summary, it avoids the huge computation and communication or employing large number of servers that are needed in HE and MPC schemes.
The next section states our problem setting. Section 3 proposes Corella algorithm. Section 4 describes our experimental results and Section 5 concludes the paper.
Notations: Capital italic bold letter denotes a random vector. Capital nonitalic bold letter
denotes a random matrix. Capital nonitalic nonbold letter
denotes a deterministic matrix. denotes the mutual information between the two random vectors and . For a function , the computation cost (e.g., the number of multiplications) and storage cost (e.g., the number of parameters) are denoted as and , respectively. , for , is defined as , where and . is calculated in base (i.e., ). For , .We consider a system including a client, with limited computation and storage resources, and servers. The client has an individual data and wishes to label it with the aid of the servers while he wishes to keep its data private from each server. All servers are honest, except up to of them, which are semihonest. It means that those servers still follow the protocol, but they are curious about the client data, and may collude to gain information about it. The client sends queries to the servers, and then by combining the received answers, he will label his data.
The system is operated in two phases, training phase, and then test phase. In the training phase, the dataset consisting of samples are used by the client to train the model, where shows a data sample and its label, for . In addition, the client generates i.i.d. noise samples , where each noise sample , with
correlated components, is sampled from a joint distribution
. The noise components are independent of the dataset samples and their labels.The data flow diagram is shown in Figure 1, where for simplicity, the sample index is omitted from the variables.
The client, having access to dataset and noise component set , uses a function to generate queries and sends to the th server. In response, the th server applies a function and generates the answer as,
(1) 
for . By combining all answers from servers using a function
, the client estimates the label
, while the information leakage from the set of queries to each servers must be negligible.In the training phase, the goal is to design or train the set of functions and according to the following optimization problem
F,P_Z1m ∑_i=1^m Loss{^Y^(i),Y^(i)} I(X^(i);{Q_j(X^(i),Z^(i)_j), j ∈T }) ≤ε i=1,…,m ∀T ⊂[N], —T— ≤T where Loss
shows the loss function between
and , for some loss function Loss and the constraint guarantees to preserve privacy in terms of information leakage through any set of queries, for some privacy parameter .We also desire that the computation and storage costs at the client side ( and functions) to be low.
To deploy this model to label a new input , the client follows the same protocol and uses the designed or trained functions set and chooses , sampled from distribution , independent of all other variables in the network.
In this section, we detail a method of implementing Corella. The approach consists of designing and , described in the following and shown in Figure 2.
Correlated joint distribution : The following steps describe the joint distribution . First a matrix is formed, such that any submatrix of size of is full rank. Then a random matrix , independent of , is formed where each entry is chosen independent and identically from , where is the size of each query and
is a positive real number and denotes the variance of each entry. Then, let
as(2) 
Function : This function, which generates the queries, consists of three blocks: (i) the sample vector is passed through a neural network with learnable parameters, denoted by (with at most one layer); (ii) the output of first block, , is Normalized to ; (iii) the query of the th server, , is generated by adding the noise component to . Therefore,
(3) 
It is worth noting that a large enough noise variance is sufficient to make the constraint of optimization (1) be satisfied, independent of choice of function.
Function : We form by running a neural network with learnable parameters, with at most one layer, denoted by function, over the sum of the received answers from the servers. Therefore,
(4) 
In some realizations, we do not use any neural network at all.
Functions to : These functions are chosen as some neural networks with learnable parameters.
The details of this method is presented in Algorithm 1. In the next theorem, we show that the proposed method satisfies privacy, if
(5) 
where
(6) 
denotes the set of all submatrices of , and denotes allones vector with length .
Let be a random vector, sampled from some distribution , and as defined in (2) and (3). If conditions (5) and (6) are satisfied, then for all of size , we have
(7) 
Let denote the covariance matrix of . Since is Normalized, then . In addition, consider the set , where , where , and let , and . Then, we have
(8) 
In addition, we define
(9) 
Thus we have,
where (a) follows since is a full rank matrix; (b) follows from (8), (9) and the fact that is independent of ; (c) follows from the fact that the set of is mutually independent; (d) follows because is independent of
and jointly Gaussian distribution maximizes the entropy of a random vector with a known covariance matrix
(Cover & Thomas, 1991); (e) follows because of by considering a symmetric and positive semidefinite matrixwith eigenvalues
, we have and , and therefore we obtainusing inequality of arithmetic and geometric means; (f) follows by inequality of
; and (g) follows from (5), (6), (9), and by substituting .∎
In Subsection 4.1, we present the implementation details of the proposed method. In Subsection 4.2 we state the privacy and accuracy result for . In Subsection 4.3 we discuss the client costs (i.e., the computation and storage costs and the number of servers) for . In Subsection 4.4, we evaluate the proposed method for .
Identity function  Identity function 








Identity function 

Identity function 

Network structure. Conv2d parameters represent the number of the input channels, the number of the output channels, the kernel size, and the stride, respectively. FC parameters represent the number of the input neurons and the number of the output neurons of a fully connected layer.
Parameters of Algorithm 1: We employ some convolutional and fully connected layers in , at most one convolutional layer in , and at most one fully connected layer in . In some scenarios, and/or is very simple and does not consist of any neural network. The network structure is presented in Table 1 in details. In this table,
means that the number of the output layers at is equal to (with the exception that means is the identity function, i.e., ), and the number of the output neurons at is equal to (with the exception that means ). We initialize the network parameters by Kaiming initialization (He et al., 2015). We use the crossentropy loss function between and for . To evaluate the accuracy of the proposed algorithm for a noise standard deviation, , we start training the model from and gradually increase the noise standard deviation with linearly increasing stepsize up to
, where in each step we run one epoch of learning, and the finally we report the accuracy at
. The sequence of stepsizes are linearly increases, as . We also decrease the learning rate from to during the training gradually. Note that in this paper, we concern about the privacy of the client data, and not the training samples.Datasets: The proposed algorithm is evaluated for MNIST (LeCun et al., 2010), FashionMNIST (Xiao et al., 2017), and Cifar10 (Krizhevsky, 2009) datasets by using their standard training set and test set. We set the training batch size equal to 128. The only used preprocessing on images is Random Crop and Random Horizontal Flip on Cifar10 training dataset. As shown in Table 1
, we use the same model structure for all datasets, except for the number of input image channels and the padding if needed.
Experiment 1: In the first experiment, for MNIST dataset, we pick model. We consider , , and changes from to . For the correlated noise components, we set and thus . Figure 2(a) reports the accuracy of whole testing dataset for versus and . It shows that using Corella, the client achieves 90% accuracy while the privacy leakage is less than , thanks to a strong noise with . Also, this figure visualizes the inputs and the outputs of the servers for an MNIST sample in this experiment. Figure 2(b) shows that in the answer of each server ( and ), in , the correct label (here 8) does not rank among the labels with highest values. However, when we add the values correspondingly, in the result, 8 has the highest value. In particular, after applying softmax, the density on the correct label is significantly higher than the rest. This observation confirms the privacy of the proposed Corella method.
A similar effect is shown in Figure 4 for this experiment, where we use a test sample in MNIST dataset with label 6 in input. Each figure in the second row of plots in Figure 4 is a 2Dplot histogram, representing the joint distribution of two neurons of the output of server one, i.e., and ( on xaxis and on yaxis). This shows how server one confuses to decide if the correct label is 6 or 9. We have this figure for different noises standard deviation. If the point is above the line , i.e., , it means that server one wrongly prefers label 9 to label 6. In the first row in Figure 4 we have the same plots for versus , where . As we can see, for large values for the standard deviation of the noise (i.e.,
), server one is almost equiprobably chooses 6 or 9, while the client is almost always correct.
Dataset  model  Accuracy  
MNIST  1  0  0  90.72  90.96  90.94  
28[1pt/1pt]  1  3.1e5  5.1e5  95.16  95.51  95.03  
1  6.3e5  9.9e5  96.40  96.00  94.92  
28[1pt/1pt]  1.0e1  3.1e4  4.0e6  90.53      
2.1e1  6.2e4  8.1e6  94.21      
FashionMNIST  1  0  0  81.00  80.13  80.28  
28[1pt/1pt]  1  3.1e5  5.1e5  83.32  80.95  80.84  
1  6.3e5  9.9e5  84.00      
1  1.2e4  1.9e4  83.02      
28[1pt/1pt]  2.1e1  6.2e4  8.1e6  81.69      
4.1e1  1.2e3  1.6e5  83.35      
8.3e1  2.4e3  3.2e5  81.51      
28[1pt/1pt]  4.1e1  1.3e3  6.7e5  83.13      
Cifar10  1  0  0  35.70  36.26  35.40  
28[1pt/1pt]  1  2.3e5  3.9e5  38.30      
1  4.7e5  7.6e5  42.47      
1  9.3e5  1.5e4  42.28      
28[1pt/1pt]  2.6e1  6.7e3  7.2e5  50.30      
5.2e1  1.3e2  1.4e4  54.39      
1.0  2.2e2  2.9e4  58.13      
2.1  3.7e2  5.7e4  55.27      
28[1pt/1pt]  1.0  2.2e2  4.3e4  58.16  58.31  57.91 
Experiment 2: In Figure 5, we compare the accuracy results of Corella in , , and with the case that we have only one server and we add noise to the input to protect data. For each dataset, we plot accuracy of , as a basic model, and another favorite model for = 0 to 70. It shows that Corella method guarantees high accuracies even in the strong noise values, which make the information leakage to each server informationtheoretically negligible. For example, in Figure 4(a), the client with low postprocessing in achieves 95% accuracy for , and , while with a single server and adding noise with the same variance, we can achieve 13% accuracy. In general, in Corella, with increasing the variance of the noise, the accuracy decreases, still converges to a reasonable accuracy. However, at some points with increasing the variance of the noise, accuracy increases. That counterintuitive observation can be justified by the fact that adding some level of noise would help to the model to better generalize. The tradeoff between privacy and accuracy allows one to choose the required noise level for a target privacy and then trains the model for that noise level.
Experiment 3: In Table 2, we evaluate the proposed algorithm for various models and number of noncolluding servers for . We consider , , and for , , and , respectively. We report the test accuracy of three datasets and the computation and storage costs of the client relative to one of the servers, where and denote the number of products and the number of parameters in a model, respectively.
The Computation Cost: Table 2 interestingly shows that increasing the client computing load for preprocessing (i.e., ) or postprocessing (i.e., ) do not always increase the accuracy. The strategy of choosing between and as the computation function in the client is different for the three datasets. For example, in MNIST and FashionMNIST datasets, increasing the computation complexity of postprocessing at is more advantageous than preprocessing at , while in Cifar10 dataset it is the opposit.
The Storage Cost: In Table 2, the difference in the number of the model parameters in the client and each server is quite evident. The proposed Corella algorithm guarantees high accuracy only by storing a limited number of parameters in the client.
The Number of Servers: Table 2 shows that increasing the number of servers (as an alternative scenario rather than increasing the client computations) does not have much impact on accuracy, in particular when some computation is already employed in the client to increase the privacy.
Experiment 4: In this expriment, we evaluate the proposed method for and . In Table 3, we report test accuracy of MNIST dataset in and model for , , and . For and , we choose
thus . For and , we choose
thus . For and , we choose
where , thus .
model  Accuracy  

In this paper, we introduced Corella, a distributed framework based on correlated queries to ensure the user data privacy in machine learning. The network is trained such that the information leakage of the client data to each server is informationtheoretically negligible, even if some of the servers collude. On the other hand, the client with some effort can recovers the result by combining the correlated outputs of the servers. The implementation results shows that the client achieves high accuracy with low computation, communication, and storage costs.
Proceedings ofthe Twentieth Annual ACM Symposium on Theory of Computing
, pp. 1–10. ACM, 1988.Privacypreserving distributed linear regression on highdimensional data.
In Proceedings on Privacy Enhancing Technologies, pp. 345–364, 2017.Deep privatefeature extraction.
In IEEE Transactions on Knowledge and Data Engineering, 2018.
Comments
There are no comments yet.