Corella: A Private Multi Server Learning Approach based on Correlated Queries

03/26/2020 ∙ by Hamidreza Ehteram, et al. ∙ 12

The emerging applications of machine learning algorithms on mobile devices motivate us to offload the computation tasks of training a model or deploying a trained one to the cloud. One of the major challenges in this setup is to guarantee the privacy of the client's data. Various methods have been proposed to protect privacy in the literature. Those include (i) adding noise to the client data, which reduces the accuracy of the result, (ii) using secure multiparty computation, which requires significant communication among the computing nodes or with the client, (iii) relying on homomorphic encryption methods, which significantly increases computation load. In this paper, we propose an alternative approach to protect the privacy of user data. The proposed scheme relies on a cluster of servers where at most T of them for some integer T, may collude, that each running a deep neural network. Each server is fed with the client data, added with a strong noise. This makes the information leakage to each server information-theoretically negligible. On the other hand, the added noises for different servers are correlated. This correlation among queries allows the system to be trained such that the client can recover the final result with high accuracy, by combining the outputs of the servers, with minor computation efforts. Simulation results for various datasets demonstrate the accuracy of the proposed approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

page 7

page 8

page 9

page 10

Code Repositories

Corella

A Private Multi Server Learning Approach based on Correlated Queries


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the expansion of machine leaning (ML) applications, dealing with high dimension datasets and models, particularly for the low resource devices (e.g., mobile units) it is inevitable to offload heavy computation and storage tasks to the cloud servers. This raises a list of challenges such as communication overhead, delay, convergence rate, operation cost, etc. One the major concerns, which is becoming increasingly important, is the maintaining the privacy of the used datasets, either the training dataset or the user dataset, such that the leaked information to the cloud servers is under control.

The information leakage of the training dataset may occur either during the training phase or from the trained model. In the training phase privacy, a data owner that offload the the task of training to some untrusted servers is concerned about the privacy of his sampled data. In the trained model privacy, the concern is to prevent the trained model from exposing information about the training dataset. On the other hand, in the user data privacy, a client wishes to employ some servers to run an already trained model on his dataset, while preserving privacy of his individual dataset versus the servers.

There are various techniques to provide privacy in machine learning scenarios, with three major categories of randomization and adding noise, which sacrifices accuracy, Secure Multiparty Computation (MPC), which requires heavy communication load or large number of servers, and Homomorphic Encryption (HE), which costs the system heavy computation overhead.

The main objective of this paper is to develop an alternative approach, which avoids heavy communication and computation overhead of MPC and HE based schemes, and at the same time guarantees high level of privacy, without severely sacrificing accuracy. The proposed approach is based on sending noisy correlated queries to some servers, where the network is trained to eliminate the effect of noise with minor computation load at the client.

1.1 Related Works

Here we briefly review three major categories of providing privacy for ML applications.

Randomization and Adding Noise: Randomization and adding noise to the client data or the ML model will confuse the servers and reduces the amount of information leakage, at the cost of sacrificing the accuracy of the result.

In (Fredrikson et al., 2015; Shokri et al., 2017), it is shown that parameters of a trained model can leak sensitive information about the training dataset. Differential privacy based approaches (Dwork, 2006; Dwork et al., 2006; Dwork & Roth, 2014) can be utilized to provide privacy to prevent this leakage. A randomized algorithm is differentially private if its output distributions for any two input adjacent datasets (two datasets that differ only in one element) are close enough. Adding noise to a deterministic function is a common method to provide differential privacy, which is also used in ML algorithms. (Dwork et al., 2014)

proposes a differentially private algorithm for principal component analysis (PCA) as a simple statistic algorithm in ML. For complex statistic algorithms (e.g., Deep Neural Networks),

(Abadi et al., 2016; Papernot et al., 2017)

add noise to the neural network output in order to make the deep learning algorithm differentially private. But the privacy-accuracy tradeoff in differential privacy

(Alvim et al., 2011) bounds the scale of added noise to the model output and thus limits the privacy preserving.

In distributed stochastic gradient descent (SGD), used in many distributed machine learning algorithms, including federated learning framework

(McMahan & Ramage, 2017; Bonawitz et al., 2019)

, privacy of the dataset can be violated by the exchanged messages. In those scenarios, each client uploads its local gradient vector with respect to its sampled data to a central server. The gradient vectors is used by the central server to update its global model, which is shared with the servers. This is shown in

(Hitaj et al., 2017) that an attacker can generate prototypical samples the client dataset by only having access to the updated model parameters. A remedy for this leakage is to use differential privacy approaches (Shokri & Shmatikov, 2015; Abadi et al., 2016; Jayaraman et al., 2018), but the privacy-accuracy tradeoff in this approaches limits the scale of noise and violates the privacy.

K-anonymity (Samarati & Sweeney, 1998) is another privacy preserving framework. The -anonymity in a dataset means that one cannot identify a data item among data (e.g., by removing some digits of zip codes) (Sweeney, 2002; Bayardo & Agrawal, 2005). But

-anonymity framework may not guarantee a good privacy, particularly for a high-dimensional data (as it happened for the Netflix Prize dataset 

(Narayanan & Shmatikov, 2008)).

As an alternative approach, authors in (Osia et al., 2018) perturb the data by passing it through a module (function) which can be trained to protect some sensitive attributes of the data, while preserves the accuracy of the learning as much as possible.

Secure Multiparty Computation: This approach exploits the existence of a set of non-colluding servers to guarantee information-theoretically privacy in some classes of the computation tasks like the polynomial functions (Yao, 1982; Ben-Or et al., 1988; Shamir, 1979). This approach can be applied to execute an ML algorithm (Gascón et al., 2017; Dahl et al., 2018; Chen et al., 2019). The shortcoming of this solution is that it costs the network huge communication overhead. To reduce this communication overhead, one solution is to judiciously approximate the nonlinear functions such that the number of interaction among the servers be reduced, even-though not completely eliminated (Mohassel & Zhang, 2017). Another solution is to rely on Lagrange coding to develop an MPC with no communication among servers (So et al., 2019). That approach can also exploit the gain of parallel computation. The major disadvantage of (So et al., 2019) is that a target accuracy is achieved at the cost of employing more servers.

Homomorphic Encryption: Homomorphic Encryption (Gentry & Boneh, 2009) is another cryptography tool that can be applied for ML applications (Graepel et al., 2012; Hesamifard et al., 2017; Li et al., 2018; Wang et al., 2018). It creates a cryptographically secure framework between the client and the servers, that allows the untrusted servers to process the encrypted data directly. However computation overhead of HE schemes is extremely high. This disadvantage is directly reflected in the time needed to train a model or use a trained one as reported in (Han et al., 2019; Gilad-Bachrach et al., 2016). On the other hand, this framework does not guarantee information-theoretic privacy assuming that the computation resource of the adversary is limited.

1.2 Contribution

In this paper, we propose Corella as a privacy preserving approach for offloading ML algorithms based on sending correlated queries to the multiple servers. These correlated queries are generated by adding strong correlated noise terms to the user data. The system is trained such that the user can recover the result by combining the (correlated) answers received from the servers, while the data is private from each server due to the strong added noise. Each server runs a regular machine learning model (say deep neural network) with no computation overhead. In addition, other than uploading the data to the servers and downloading the results, there is no communication among the servers or between the servers and the user. Thus the proposed scheme provides information theoretical privacy while maintaining the communication and computation costs affordable. We apply the proposed approach for user data privacy

problem in a supervised learning setup. We consider a client with limited computation and storage resources who wishes to label his individual data. Thus for processing, it relies on a cluster of

semi-honest servers, where up to of them may collude. It means the servers are honest in protocol compliance, but curious to learn about the client data, and at most an arbitrary subset of out of of them may collude to learn it. The objective is to design an algorithm for this setup with reasonable accuracy, while the leakage of the client data to any colluding servers is information theoretically small.

In summary, Corella offers the following desirable features:

  1. Thanks to the strong added noise to the user data, information leakage to each server is negligible (i.e., it is information-theoretically private).

  2. The correlation among the queries enables the system to be trained such that the user can cancel the effect of the noise by combing the correlated answers and recover the final result with high accuracy.

  3. There is no communication among the servers. Moreover, computation load per server is reasonable. In summary, it avoids the huge computation and communication or employing large number of servers that are needed in HE and MPC schemes.

The next section states our problem setting. Section 3 proposes Corella algorithm. Section 4 describes our experimental results and Section 5 concludes the paper.

Notations: Capital italic bold letter denotes a random vector. Capital non-italic bold letter

denotes a random matrix. Capital non-italic non-bold letter

denotes a deterministic matrix. denotes the mutual information between the two random vectors and . For a function , the computation cost (e.g., the number of multiplications) and storage cost (e.g., the number of parameters) are denoted as and , respectively. , for , is defined as , where and . is calculated in base (i.e., ). For , .

2 General Corella Framework

We consider a system including a client, with limited computation and storage resources, and servers. The client has an individual data and wishes to label it with the aid of the servers while he wishes to keep its data private from each server. All servers are honest, except up to of them, which are semi-honest. It means that those servers still follow the protocol, but they are curious about the client data, and may collude to gain information about it. The client sends queries to the servers, and then by combining the received answers, he will label his data.

The system is operated in two phases, training phase, and then test phase. In the training phase, the dataset consisting of samples are used by the client to train the model, where shows a data sample and its label, for . In addition, the client generates i.i.d. noise samples , where each noise sample , with

correlated components, is sampled from a joint distribution

. The noise components are independent of the dataset samples and their labels.

The data flow diagram is shown in Figure 1, where for simplicity, the sample index is omitted from the variables.

The client, having access to dataset and noise component set , uses a function to generate queries and sends to the -th server. In response, the -th server applies a function and generates the answer as,

(1)

for . By combining all answers from servers using a function

, the client estimates the label

, while the information leakage from the set of queries to each servers must be negligible.

Figure 1: The general Corella framework for Privacy Preserving ML

In the training phase, the goal is to design or train the set of functions and according to the following optimization problem

F,P_Z1m ∑_i=1^m Loss{^Y^(i),Y^(i)} I(X^(i);{Q_j(X^(i),Z^(i)_j), j ∈T })ε i=1,…,m ∀T ⊂[N], —T— ≤T where Loss

shows the loss function between

and , for some loss function Loss and the constraint guarantees to preserve -privacy in terms of information leakage through any set of queries, for some privacy parameter .

We also desire that the computation and storage costs at the client side ( and functions) to be low.

To deploy this model to label a new input , the client follows the same protocol and uses the designed or trained functions set and chooses , sampled from distribution , independent of all other variables in the network.

3 Method

In this section, we detail a method of implementing Corella. The approach consists of designing and , described in the following and shown in Figure 2.

Correlated joint distribution : The following steps describe the joint distribution . First a matrix is formed, such that any submatrix of size of is full rank. Then a random matrix , independent of , is formed where each entry is chosen independent and identically from , where is the size of each query and

is a positive real number and denotes the variance of each entry. Then, let

as

(2)

Function : This function, which generates the queries, consists of three blocks: (i) the sample vector is passed through a neural network with learnable parameters, denoted by (with at most one layer); (ii) the output of first block, , is Normalized to ; (iii) the query of the -th server, , is generated by adding the noise component to . Therefore,

(3)

It is worth noting that a large enough noise variance is sufficient to make the constraint of optimization (1) be satisfied, independent of choice of function.

Function : We form by running a neural network with learnable parameters, with at most one layer, denoted by function, over the sum of the received answers from the servers. Therefore,

(4)

In some realizations, we do not use any neural network at all.

Functions to : These functions are chosen as some neural networks with learnable parameters.

The details of this method is presented in Algorithm 1. In the next theorem, we show that the proposed method satisfies -privacy, if

(5)

where

(6)

denotes the set of all submatrices of , and denotes all-ones vector with length .


Figure 2: A Method to Implement Corella
Theorem 1 (-privacy).

Let be a random vector, sampled from some distribution , and as defined in (2) and (3). If conditions (5) and (6) are satisfied, then for all of size , we have

(7)
Proof.

Let denote the covariance matrix of . Since is Normalized, then . In addition, consider the set , where , where , and let , and . Then, we have

(8)

In addition, we define

(9)

Thus we have,

where (a) follows since is a full rank matrix; (b) follows from (8), (9) and the fact that is independent of ; (c) follows from the fact that the set of is mutually independent; (d) follows because is independent of

and jointly Gaussian distribution maximizes the entropy of a random vector with a known covariance matrix

(Cover & Thomas, 1991); (e) follows because of by considering a symmetric and positive semi-definite matrix

with eigenvalues

, we have and , and therefore we obtain

using inequality of arithmetic and geometric means; (f) follows by inequality of

; and (g) follows from (5), (6), (9), and by substituting .

4 Experiments

In Subsection 4.1, we present the implementation details of the proposed method. In Subsection 4.2 we state the privacy and accuracy result for . In Subsection 4.3 we discuss the client costs (i.e., the computation and storage costs and the number of servers) for . In Subsection 4.4, we evaluate the proposed method for .

4.1 Implementation Details

Identity function Identity function
Conv2d (1,,(5,5),3)
ReLU
Conv2d (1,,(5,5),3)
ReLU
Conv2d (1,64,(5,5),3)
ReLU
Conv2d (64,128,(3,3),1)
ReLU
Flatten
FC (128*7*7,1024)
ReLU
FC (1024,10)
Conv2d (1,64,(5,5),3)
ReLU
Conv2d (64,128,(3,3),1)
ReLU
Flatten
FC (128*7*7,1024)
ReLU
FC (1024,)
Conv2d (,128,(3,3),1)
ReLU
Flatten
FC (128*7*7,1024)
ReUL
FC (1024,10)
Conv2d (,128,(3,3),1)
ReLU
Flatten
FC (128*7*7,1024)
ReLU
FC (1024,)
Identity function
ReLU
FC (,10)
Identity function
ReLU
FC (,10)
Table 1:

Network structure. Conv2d parameters represent the number of the input channels, the number of the output channels, the kernel size, and the stride, respectively. FC parameters represent the number of the input neurons and the number of the output neurons of a fully connected layer.

(a) Accuracy of whole MNIST testing dataset for versus and in model, , and . Also, it shows the inputs of both servers for a testing sample with label 8 for .


(b) The servers output and summation of them with and without applying softmax function for the testing sample with label 8 for .
Figure 3: The results of Experiment 1.
Figure 4: The joint distribution of in the first row and in the second row

for a sample with label 6 for various noise standard deviation in Expriment 1.

(a) MNIST
(b) Fashion-MNIST
(c) Cifar-10
(d) MNIST
(e) Fashion-MNIST
(f) Cifar-10
Figure 5: Privacy and Accuracy Curves (The results of Experiment 2)

Parameters of Algorithm 1: We employ some convolutional and fully connected layers in , at most one convolutional layer in , and at most one fully connected layer in . In some scenarios, and/or is very simple and does not consist of any neural network. The network structure is presented in Table 1 in details. In this table,

0:  
  
   the output size of
  function  
     Draw noise sampels from
     Shape the noise sampels to matrix
     Compute the noise components
     
     return
  end function
  for number of training iterations do
     Forward path:
     Draw minibatch samples from
     Draw noise samples from
     Compute the client features
     Normalize the client features
     Compute the servers queries
     Compute the servers answers
     Compute sum of the answers
     Compute the client predicted labels
     Copmute the loss
     Backward path:
     Update all parameters by descending their stochastic gradient
  end for
   and updated
Algorithm 1 -privacy with servers, up to colluding

means that the number of the output layers at is equal to (with the exception that means is the identity function, i.e., ), and the number of the output neurons at is equal to (with the exception that means ). We initialize the network parameters by Kaiming initialization (He et al., 2015). We use the cross-entropy loss function between and for . To evaluate the accuracy of the proposed algorithm for a noise standard deviation, , we start training the model from and gradually increase the noise standard deviation with linearly increasing step-size up to

, where in each step we run one epoch of learning, and the finally we report the accuracy at

. The sequence of step-sizes are linearly increases, as . We also decrease the learning rate from to during the training gradually. Note that in this paper, we concern about the privacy of the client data, and not the training samples.

Datasets: The proposed algorithm is evaluated for MNIST (LeCun et al., 2010), Fashion-MNIST (Xiao et al., 2017), and Cifar-10 (Krizhevsky, 2009) datasets by using their standard training set and test set. We set the training batch size equal to 128. The only used preprocessing on images is Random Crop and Random Horizontal Flip on Cifar-10 training dataset. As shown in Table 1

, we use the same model structure for all datasets, except for the number of input image channels and the padding if needed.

4.2 Privacy and Accuracy Curve

Experiment 1: In the first experiment, for MNIST dataset, we pick model. We consider , , and changes from to . For the correlated noise components, we set and thus . Figure 2(a) reports the accuracy of whole testing dataset for versus and . It shows that using Corella, the client achieves 90% accuracy while the privacy leakage is less than , thanks to a strong noise with . Also, this figure visualizes the inputs and the outputs of the servers for an MNIST sample in this experiment. Figure 2(b) shows that in the answer of each server ( and ), in , the correct label (here 8) does not rank among the labels with highest values. However, when we add the values correspondingly, in the result, 8 has the highest value. In particular, after applying softmax, the density on the correct label is significantly higher than the rest. This observation confirms the privacy of the proposed Corella method.

A similar effect is shown in Figure 4 for this experiment, where we use a test sample in MNIST dataset with label 6 in input. Each figure in the second row of plots in Figure 4 is a 2D-plot histogram, representing the joint distribution of two neurons of the output of server one, i.e., and ( on x-axis and on y-axis). This shows how server one confuses to decide if the correct label is 6 or 9. We have this figure for different noises standard deviation. If the point is above the line , i.e., , it means that server one wrongly prefers label 9 to label 6. In the first row in Figure 4 we have the same plots for versus , where . As we can see, for large values for the standard deviation of the noise (i.e.,

), server one is almost equi-probably chooses 6 or 9, while the client is almost always correct.

Dataset model Accuracy
MNIST 1 0 0 90.72 90.96 90.94
2-8[1pt/1pt] 1 3.1e-5 5.1e-5 95.16 95.51 95.03
1 6.3e-5 9.9e-5 96.40 96.00 94.92
2-8[1pt/1pt] 1.0e-1 3.1e-4 4.0e-6 90.53 - -
2.1e-1 6.2e-4 8.1e-6 94.21 - -
Fashion-MNIST 1 0 0 81.00 80.13 80.28
2-8[1pt/1pt] 1 3.1e-5 5.1e-5 83.32 80.95 80.84
1 6.3e-5 9.9e-5 84.00 - -
1 1.2e-4 1.9e-4 83.02 - -
2-8[1pt/1pt] 2.1e-1 6.2e-4 8.1e-6 81.69 - -
4.1e-1 1.2e-3 1.6e-5 83.35 - -
8.3e-1 2.4e-3 3.2e-5 81.51 - -
2-8[1pt/1pt] 4.1e-1 1.3e-3 6.7e-5 83.13 - -
Cifar-10 1 0 0 35.70 36.26 35.40
2-8[1pt/1pt] 1 2.3e-5 3.9e-5 38.30 - -
1 4.7e-5 7.6e-5 42.47 - -
1 9.3e-5 1.5e-4 42.28 - -
2-8[1pt/1pt] 2.6e-1 6.7e-3 7.2e-5 50.30 - -
5.2e-1 1.3e-2 1.4e-4 54.39 - -
1.0 2.2e-2 2.9e-4 58.13 - -
2.1 3.7e-2 5.7e-4 55.27 - -
2-8[1pt/1pt] 1.0 2.2e-2 4.3e-4 58.16 58.31 57.91
Table 2: Test accuracy in various models for different number of servers and . MNIST, Fashion-MNIST, and Cifar-10 test accuracies are evaluated for 0, 0, and 1.5, respectively (The results of Experiment 3).

Experiment 2: In Figure 5, we compare the accuracy results of Corella in , , and with the case that we have only one server and we add noise to the input to protect data. For each dataset, we plot accuracy of , as a basic model, and another favorite model for = 0 to 70. It shows that Corella method guarantees high accuracies even in the strong noise values, which make the information leakage to each server information-theoretically negligible. For example, in Figure 4(a), the client with low post-processing in achieves 95% accuracy for , and , while with a single server and adding noise with the same variance, we can achieve 13% accuracy. In general, in Corella, with increasing the variance of the noise, the accuracy decreases, still converges to a reasonable accuracy. However, at some points with increasing the variance of the noise, accuracy increases. That counter-intuitive observation can be justified by the fact that adding some level of noise would help to the model to better generalize. The tradeoff between privacy and accuracy allows one to choose the required noise level for a target privacy and then trains the model for that noise level.

4.3 The Client Costs

Experiment 3: In Table 2, we evaluate the proposed algorithm for various models and number of non-colluding servers for . We consider , , and for , , and , respectively. We report the test accuracy of three datasets and the computation and storage costs of the client relative to one of the servers, where and denote the number of products and the number of parameters in a model, respectively.

The Computation Cost: Table 2 interestingly shows that increasing the client computing load for preprocessing (i.e., ) or post-processing (i.e., ) do not always increase the accuracy. The strategy of choosing between and as the computation function in the client is different for the three datasets. For example, in MNIST and Fashion-MNIST datasets, increasing the computation complexity of post-processing at is more advantageous than preprocessing at , while in Cifar-10 dataset it is the opposit.

The Storage Cost: In Table 2, the difference in the number of the model parameters in the client and each server is quite evident. The proposed Corella algorithm guarantees high accuracy only by storing a limited number of parameters in the client.

The Number of Servers: Table 2 shows that increasing the number of servers (as an alternative scenario rather than increasing the client computations) does not have much impact on accuracy, in particular when some computation is already employed in the client to increase the privacy.

4.4 Results for

Experiment 4: In this expriment, we evaluate the proposed method for and . In Table 3, we report test accuracy of MNIST dataset in and model for , , and . For and , we choose

thus . For and , we choose

thus . For and , we choose

where , thus .


model Accuracy
Table 3: Test accuracy of MNIST dataset for (The results of Experiment 4).

5 Conclusion

In this paper, we introduced Corella, a distributed framework based on correlated queries to ensure the user data privacy in machine learning. The network is trained such that the information leakage of the client data to each server is information-theoretically negligible, even if some of the servers collude. On the other hand, the client with some effort can recovers the result by combining the correlated outputs of the servers. The implementation results shows that the client achieves high accuracy with low computation, communication, and storage costs.

References

  • Abadi et al. (2016) Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM, 2016.
  • Alvim et al. (2011) Alvim, M. S., Andres, M. E., Chatzikokolakis, K., Degano, P., and Palamidessi, C. Differential privacy: On the trade-off between utility and information leakage. In FAST 2011, pp. 39–54. Springer, 2011.
  • Bayardo & Agrawal (2005) Bayardo, R. J. and Agrawal, R. Data privacy through optimal kanonymization. In IEEE International Conference on Data Engineering, pp. 217–228, 2005.
  • Ben-Or et al. (1988) Ben-Or, M., Goldwasser, S., and Wigderson, A. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In

    Proceedings ofthe Twentieth Annual ACM Symposium on Theory of Computing

    , pp. 1–10. ACM, 1988.
  • Bonawitz et al. (2019) Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., Kiddon, C., Konecny, J., Mazzocchi, S., McMahan, H. B., Overveldt, T. V., Petrou, D., Ramage, D., and Roselander, J. Towards federated learning at scale: System design. In Conference on Systems and Machine Learning (SysML), 2019.
  • Chen et al. (2019) Chen, V., Pastro, V., and Raykova, M. Secure computation for machine learning with spdz, 2019. arXiv:1901.00329.
  • Cover & Thomas (1991) Cover, T. M. and Thomas, J. A. Elements of Information Theory. New York: Wiley, 1991.
  • Dahl et al. (2018) Dahl, M., Mancuso, J., Dupis, Y., DeCoste, B., Giraud, M., Livingstone, I., Patriquin, J., and Uhma, G. Private machine learning in tensorflow using secure computation, 2018. arXiv:1810.08130.
  • Dwork (2006) Dwork, C. Differential privacy. In International Colloquium on Automata, Languages and Programming, pp. 1–12, 2006.
  • Dwork & Roth (2014) Dwork, C. and Roth, A. The algorithmic foundations of differential privacy. In Foundations and Trends in Theoretical Computer Science, pp. 211–407, 2014.
  • Dwork et al. (2006) Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In TCC, pp. 265–284. Springer, 2006.
  • Dwork et al. (2014) Dwork, C., Talwar, K., Thakurta, A., and Zhang, L. Analyze gauss: Optimal bounds for privacy-preserving principal component analysis. In ACM STOC, pp. 11–20, 2014.
  • Fredrikson et al. (2015) Fredrikson, M., Jha, S., and Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In ACM CCS, pp. 1322–1333, 2015.
  • Gascón et al. (2017) Gascón, A., Schoppmann, P., Balle, B., Raykova, M., Doerner, J., Zahur, S., and Evans, D.

    Privacy-preserving distributed linear regression on high-dimensional data.

    In Proceedings on Privacy Enhancing Technologies, pp. 345–364, 2017.
  • Gentry & Boneh (2009) Gentry, C. and Boneh, D. A fully homomorphic encryption scheme, 2009. Stanford University, Stanford.
  • Gilad-Bachrach et al. (2016) Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., and Wernsing, J. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning, pp. 201–210, 2016.
  • Graepel et al. (2012) Graepel, T., Lauter, K., and Naehrig, M. Ml confidential: Machine learning on encrypted data. In International Conference on Information Security and Cryptology, pp. 1–21. Springer, 2012.
  • Han et al. (2019) Han, K., Hong, S., Cheon, J. H., and Park, D. Logistic regression on homomorphic encrypted data at scale. In Thirty-First Annual Conference on Innovative Applications ofArtificial Intelligence (IAAI-19), 2019.
  • He et al. (2015) He, K., Zhang, X., Ren, S., and Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, 2015. arXiv:1502.01852.
  • Hesamifard et al. (2017) Hesamifard, E., Takabi, H., and Ghasemi, M. Cryptodl: Deep neural networks over encrypted data, 2017. arXiv:1711.05189.
  • Hitaj et al. (2017) Hitaj, B., Ateniese, G., and Perez-Cruz, F. Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618. ACM, 2017.
  • Jayaraman et al. (2018) Jayaraman, B., Wang, L., Evans, D., and Gu, Q. Distributed learning without distress: Privacy-preserving empirical risk minimization. In advances in Neural Information Processing Systems, pp. 6346–6357, 2018.
  • Krizhevsky (2009) Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, 2009.
  • LeCun et al. (2010) LeCun, Y., Cortes, C., and Burges, C. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, 2, 2010.
  • Li et al. (2018) Li, X., Zhu, Y., Wang, J., Liu, Z., Liu, Y., and Zhang, M. On the soundness and security of privacy-preserving svm for outsourcing data classification. IEEE Trans. Dependable Secure Comput, 15(5), 2018.
  • McMahan & Ramage (2017) McMahan, H. B. and Ramage, D. Federated learning: Collaborative machine learning without centralized training data. https://ai.googleblog.com/2017/04/federated-learning-collaborative.html, April 2017. Google AI Blog.
  • Mohassel & Zhang (2017) Mohassel, P. and Zhang, Y. Secureml: A system for scalable privacy-preserving machine learning. In 38th IEEE Symposium on Security and Privacy, pp. 19–38. IEEE, 2017.
  • Narayanan & Shmatikov (2008) Narayanan, A. and Shmatikov, V. How to break anonymity of the netflix prize dataset, 2008. arXiv:cs/0610105.
  • Osia et al. (2018) Osia, S. A., Taheri, A., Shamsabadi, A. S., Katevas, M., Haddadi, H., and Rabiee, H. R.

    Deep private-feature extraction.

    In IEEE Transactions on Knowledge and Data Engineering, 2018.
  • Papernot et al. (2017) Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., and Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data. In the International Conference on Learning Representations (ICLR), 2017.
  • Samarati & Sweeney (1998) Samarati, P. and Sweeney, L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI Computer Science Laboratory, Palo Alto, CA, 1998.
  • Shamir (1979) Shamir, A. How to share a secret. Communications ofthe ACM, 22(11):612–613, 1979.
  • Shokri & Shmatikov (2015) Shokri, R. and Shmatikov, V. Privacy-preserving deep learning. In ACM Conference on Computer and Communications Security, pp. 1310–1321, 2015.
  • Shokri et al. (2017) Shokri, R., Stronati, M., Song, C., and Shmatikov, V. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18, 2017.
  • So et al. (2019) So, J., Guler, B., Avestimehr, A. S., and Mohassel, P. Codedprivateml: A fast and privacy-preserving framework for distributed machine learning, 2019. arXiv:1902.00641.
  • Sweeney (2002) Sweeney, L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570, 2002.
  • Wang et al. (2018) Wang, Q., Du, M., Chen, X., Chen, Y., Zhou, P., Chen, X., and Huang, X. Privacy-preserving collaborative model learning: The case of word vector training. IEEE Transactions on Knowledge and Data Engineering, 30(12):2381–2393, 2018.
  • Xiao et al. (2017) Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017. arXiv:1708.07747.
  • Yao (1982) Yao, A. C. Protocols for secure computations. In IEEE Annual Symposium on Foundations ofComputer Science, pp. 160–164, 1982.