1 Introduction
Modern machine learning models are breaking new ground by achieving unprecedented performance in various application domains. Training such models, however, is a daunting task. Due to the typically large volume of data and complexity of models, training is a compute and storage intensive task. Furthermore, training should often be done on sensitive data, such as healthcare records, browsing history, or financial transactions, which raises the issues of security and privacy of the dataset. This creates a challenging dilemma. On the one hand, due to its complexity, training is often desired to be outsourced to more capable computing platforms, such as the cloud. On the other hand, the training dataset is often sensitive and particular care should be taken to protect the privacy of the dataset against potential breaches in such platforms. This dilemma gives rise to the main problem that we study here: How can we offload the training task to a distributed computing platform, while maintaining the privacy of the dataset?
More specifically, we consider a scenario in which a dataowner (e.g., a hospital) wishes to train a logistic regression model by offloading the large volume of data (e.g., healthcare records) and computationallyintensive training tasks (e.g., gradient computations) to
machines over a cloud platform, while ensuring that any collusions between out of workers do not leak information about the training dataset. We focus on the semihonest adversary setup, where the corrupted parties follow the protocol but may leak information in an attempt to learn the training dataset.We propose CodedPrivateML for this problem, which has three salient features:

[topsep=0pt, partopsep=0pt, itemsep=0pt]

provides strong informationtheoretic privacy guarantees for both the training dataset and model parameters in the presence of colluding workers.

enables fast training by distributing the training computation load effectively across several workers.

leverages a new method for secret sharing the dataset and model parameters based on coding and information theory principles, which significantly reduces the communication overhead and the complexity for distributed training.
At a high level, CodedPrivateML can be described as follows. It secret shares the dataset and model parameters at each round of the training in two steps. First, it employs stochastic quantization to convert the dataset and the weight vector at each round into a finite domain. It then combines (or
encodes) the quantized values with random matrices, using a novel coding technique named Lagrange coding (Yu et al., 2019), to guarantee privacy (in an informationtheoretic sense) while simultaneously distributing the workload among multiple workers. The challenge is however that Lagrange coding can only work for computations that are in the form of polynomial evaluations. The gradient computation for logistic regression, on the other hand, includes nonlinearities that cannot be expressed as polynomials. CodedPrivateML handles this challenge through polynomial approximations of the nonlinear sigmoid function in the training phase.
Upon secret sharing of the encoded dataset and model parameters, each worker performs the gradient computations using the chosen polynomial approximation of the sigmoid function, and sends the result back to the master. It is useful to note that the workers perform the computations over the quantized and encoded data as if they were computing over the true dataset. That is, the structure of the computations are the same for computing over the true dataset versus computing over the encoded dataset.
Finally, the master collects the results from a subset of fastest workers and decodes the gradient over the finite field. It then converts the decoded gradients to the real domain, updates the weight vector, and secret shares it with the worker nodes for the next round. We note that since the computations are performed in a finite domain while the weights are updated in the real domain, the update process may lead to undesired behaviour as weights may not converge. Our system guarantees convergence through the proposed stochastic quantization technique while converting between real and finite fields.
We theoretically prove that CodedPrivateML guarantees the convergence of the model parameters, while providing informationtheoretic privacy for the training dataset. Our theoretical analysis also identifies a tradeoff between privacy and parallelization. More specifically, each additional worker can be utilized either for more privacy, by protecting against a larger number of collusions , or more parallelization, by reducing the computation load at each worker. We characterize this tradeoff for CodedPrivateML.
Furthermore, we empirically demonstrate the impact of CodedPrivateML by comparing it with stateoftheart cryptographic approaches based on secure multiparty computing (MPC) (Yao, 1982; BenOr et al., 1988), that can also be applied to enable privacypreserving machine learning tasks (e.g. see (Nikolaenko et al., 2013; Gascón et al., 2017; Mohassel & Zhang, 2017; Lindell & Pinkas, 2000; Dahl et al., 2018; Chen et al., 2019)). In particular, we envision a master who secret shares its data and model parameters among multiple workers who collectively perform the gradient computation using a multiround MPC protocol. Given our focus on informationtheoretic privacy, the most relevant MPCbased scheme for empirical comparison is the BGWstyle (BenOr et al., 1988) approach based on Shamir’s secret sharing (Shamir, 1979). While several more recent work design MPCbased private learning solutions with informationtheoretic security (Wagh et al., 2018; Mohassel & Rindal, 2018), their constructions are limited to three or four parties.
We run extensive experiments over Amazon EC2 cloud to empirically demonstrate the performance of CodedPrivateML. We train a logistic regression model for image classification over the MNIST dataset (LeCun et al., 2010), while the computation workload is distributed to up to machines over the cloud. We demonstrate that CodedPrivateML can provide substantial speedup in training time (up to ), compared with MPCbased schemes, while guaranteeing the same level of accuracy. The primary disadvantage of the MPCbased scheme is its reliance on extensive communication and coordination between the workers for distributed private computing, and not benefiting from parallelization among the workers as the whole computation is repeated by all players who take part in MPC. They however guarantee a higher privacy threshold (i.e., larger ) compared with CodedPrivateML.
Other related works. Apart from MPCbased schemes to this problem, one can consider two other solutions. One is based on Homomorphic Encryption (HE) (Gentry & Boneh, 2009) which allows for computation to be performed over encrypted data, and has been used to enable privacypreserving machine learning solutions (GiladBachrach et al., 2016; Hesamifard et al., 2017; Graepel et al., 2012; Yuan & Yu, 2014; Li et al., 2017; Kim et al., 2018; Wang et al., 2018; Han et al., 2019). The privacy guarantees of HE are based on computational assumptions, whereas our system provides strong informationtheoretic security. Moreover, HE requires computations to be performed over encrypted data which leads to many orders of magnitude slow down in training. For example, for image classification on the MNIST dataset, HE takes hours to learn a logistic regression model with accuracy (Han et al., 2019). In contrast, in CodedPrivateML there is no slow down in performing the coded computations which allows for a faster implementation. As a tradeoff, HE allows collusion between a larger number of workers whereas in CodedPrivateML this number is determined by other system parameters such as number of workers and the computation load assigned to each worker.
Another possible solution is based on differential privacy (DP), which is a release mechanism that preserves the privacy of personally identifiable information, in that the removal of any single element from the dataset does not change the computation outcomes significantly (Dwork et al., 2006). In the context of machine learning, DP is mainly used for training when the model parameters are to be released for public use, to ensure that the individual data points from the dataset cannot be identified from the released model (Chaudhuri & Monteleoni, 2009; Shokri & Shmatikov, 2015; Abadi et al., 2016; Pathak et al., 2010; McMahan et al., 2018; Rajkumar & Agarwal, 2012; Jayaraman et al., 2018). The main difference between these approaches and our work is that we can guarantee strong informationtheoretic privacy that leaks no information about the dataset, and preserve the accuracy of the model throughout the training. We note however that it is in principal possible to compose techniques of CodedPrivateML with differential privacy to obtain the best of both worlds if the intention is to publicly release the final model, but we leave this as future work.
2 Problem Setting
We study the problem of training a logistic regression model. The training dataset is represented by a matrix consisting of data points with features and a label vector . Row of is denoted by .
The model parameters (weights) are obtained by minimizing the cross entropy function,
(1) 
where
is the estimated probability of label
being equal to and is the sigmoid function(2) 
The problem in (1) can be solved via gradient descent, through an iterative process that updates the model parameters in the opposite direction of the gradient. The gradient for (1) is given by . Accordingly, model parameters are updated as,
(3) 
where holds the estimated parameters from iteration , is the learning rate, and function operates elementwise over the vector given by .
As shown in Figure 1, we consider a masterworker distributed computing architecture, where the master offloads the computationallyintensive operations to workers. These operations correspond to gradient computations in (3). In doing so, master wishes to protect the privacy of the dataset against any potential collusions between up to workers, where is the privacy parameter of the system.
At the beginning of the training, dataset is shared in a privacypreserving manner among the workers. To do so, is first partitioned into submatrices , for some . Parameter is related to the computation load at each worker (i.e., what fraction of the dataset is processed at each worker), as well as the number of workers the master has to wait for, to reconstruct the gradient at each step. The master then creates encoded submatrices, denoted by , by combining the parts of the dataset together with some random matrices to preserve privacy, and sends to worker . This process should only be performed once for the dataset .
At each iteration of the training, the master also needs to send to worker the current estimate of the model parameters (i.e., in (3)). However, it is recently shown that the intermediate model parameters can also leak substantial information about the dataset (Melis et al., 2019). The master also needs to prevent the leakage of these intermediate parameters. To that end, the master creates an encoded matrix to secret share the current estimate of model parameters with worker . This coding strategy should also be private against any colluding workers.
More specifically, the coding strategy that is used for secret sharing the dataset (i.e., creating ’s) and model parameters (i.e., creating ’s) should be such that any subset of colluding workers can not learn any information, in the strong informationtheoretic sense, about the training dataset . Formally, for every subset of workers of size at most , we should have,
(4) 
where denotes the mutual information, is the number of iterations, and is the collection of the coded matrices and coded parameter estimations stored at workers in . We refer to a protocol that guarantees privacy against colluding workers as a private protocol.
At each iteration, worker performs its computation locally using and and sends the result back to the master. After receiving the results from a sufficient number of workers, the master recovers , reconstructs the gradients, and updates the model parameters using (3). In doing so, the master needs to wait only for the fastest workers. We define the recovery threshold of the protocol as the minimum number of workers the master needs to wait for. The relations between the recovery threshold and parameters , , and will be detailed in our theoretical analysis.
Remark 1.
Although our presentation is based on logistic regression, CodedPrivateML can also be applied to linear regression with minor modifications.
3 The Proposed CodedPrivateML Strategy
CodedPrivateML strategy consists of four main phases that are first described at a highlevel below, and then with details in the rest of this section.
Phase 1: Quantization. In order to guarantee informationtheoretic privacy, one has to mask the dataset and the weight vector in a finite field using uniformly random matrices, so that the added randomness can make each data point appear equally likely. In contrast, the dataset and weight vectors for the training task are defined in the domain of real numbers. We address this by employing a stochastic quantization technique to convert the parameters from the real domain to the finite domain and vice versa. Accordingly, in the first phase of our system, master quantizes the dataset and weights from the real domain to the domain of integers, and then embeds them in a field of integers modulo a prime . The quantized version of the dataset is given by . The quantization of the weight vector , on the other hand, is represented by a matrix , where each column holds an independent stochastic quantization of . This structure will be important in ensuring the convergence of the model. Parameter is selected to be sufficiently large to avoid wraparound in computations. Its value depends on the bitwidth of the machine as well as the number of additive and multiplicative operations. For example, in a bit implementation, we select (the largest prime with bits) as detailed in our experiments.
Phase 2: Encoding and Secret Sharing. In the second phase, the master partitions the quantized dataset into submatrices and encodes them using the recently proposed Lagrange coding technique (Yu et al., 2019), which we will describe in detail in Section 3.2. It then sends to worker a coded submatrix . As we will illustrate later, this encoding ensures that the coded matrices do not leak any information about the true dataset, even if workers collude. In addition, the master has to ensure the weight estimations sent to the workers at each iteration do not leak information about the dataset. This is because the weights updated via (3) carry information about the whole training set, and sending them directly to the workers may breach privacy. In order to prevent this, at iteration , master also quantizes the current weight vector to the finite field and encodes it again using Lagrange coding.
Phase 3: Polynomial Approximation and Local Computations. In the third phase, each worker performs the computations using its local storage and sends the result back to the master. We note that the workers perform the computations over the encoded data as if they were computing over the true dataset. That is, the structure of the computations are the same for computing over the true dataset versus computing over the encoded dataset. A major challenge is that Lagrange coding is designed for distributed polynomial computations. However, the computations in the training phase are not polynomials due to the sigmoid function. We overcome this by approximating the sigmoid with a polynomial of a selected degree . This allows us to represent the gradient computations in terms of polynomials that can be computed locally by each worker.
Phase 4: Decoding and Model Update. The master collects the results from a subset of fastest workers and decodes the gradient over the finite field. Finally, master converts the decoded gradients to the real domain, updates the weight vector, and secret shares it with workers for the next round.
We next provide the details of each phase. The overall algorithm of CodedPrivateML, and each of its four phases, are also presented in Appendix A.1 of supplementary materials.
3.1 Quantization
We consider an elementwise lossy quantization scheme for the dataset and weights. For quantizing the dataset , we use a simple deterministic rounding technique:
(5) 
where is the largest integer less than or equal to . We define the quantized dataset as
(6) 
where the rounding function from (5) is applied elementwise to the elements of matrix and
is an integer parameter that controls the quantization loss. Function
is a mapping defined to represent a negative integer in the finite field by using two’s complement representation,(7) 
Note that the domain of (6) is . To avoid a wraparound which may lead to an overflow error, prime should be large enough, i.e., .
At each iteration, master also quantizes the weight vector from real domain to the finite field. This proves to be a challenging task as it should be performed in a way to ensure the convergence of the model. Our solution to this is a quantization technique inspired by (Zhang et al., 2017, 2016). Initially, we define a stochastic quantization function:
(8) 
where is an integer parameter to control the quantization loss. is a stochastic rounding function:
The probability of rounding to is proportional to the proximity of to so that stochastic rounding is unbiased (i.e., ).
For quantizing the weight vector , the master creates independent quantized vectors:
(9) 
where the quantization function (8) is applied elementwise to the vector and each denotes an independent realization of (8). The number of quantized vectors is equal to the degree of the polynomial approximation for the sigmoid function, which we will describe later in Section 3.3. The intuition behind creating
independent quantizations is to ensure that the gradient computations performed using the quantized weights are unbiased estimators of the true gradients. As detailed in Section
4, this property is fundamental for the convergence analysis of our model. The specific values of parameters and provide a tradeoff between the rounding error and overflow error. In particular, a larger value reduces the rounding error while increasing the chance of an overflow. We denote the quantization of the weight vector as(10) 
by arranging the quantized vectors from (9) in matrix form.
3.2 Encoding and Secret Sharing
The master first partitions the quantized dataset into submatrices , where for . It also selects distinct elements from . It then employs Lagrange coding (Yu et al., 2019) to encode the dataset. More specifically, it finds a polynomial of degree at most such that for , and for , where ’s are chosen uniformly at random from (the role of ’s is to mask the dataset and provide privacy against up to colluding workers). This is accomplished by letting be the respective
Lagrange interpolation polynomial
(11) 
Master then selects distinct elements from such that , and encodes the dataset by letting for . By defining an encoding matrix whose element is given by , one can also represent the encoding of the dataset as,
(12) 
At iteration , the quantized weights are also encoded using a Lagrange interpolation polynomial,
(13) 
where for are chosen uniformly at random from . The coefficients are the same as the ones in (3.2). We note that the polynomial in (3.2) has the property for .
The master then encodes the quantized weight vector by using the same evaluation points . Accordingly, the weight vector is encoded as
(14) 
for , using the encoding matrix from (12). The degree of the polynomials and are both .
3.3 Polynomial Approximation and Local Computation
Upon receiving the encoded (and quantized) dataset and weights, workers should proceed with gradient computations. However, a major challenge is that Lagrange coding is originally designed for polynomial computations, while the gradient computations that the workers need to do are not polynomials due to the sigmoid function. Our solution is to use a polynomial approximation of the sigmoid function,
(15) 
where and denotes the degree and coefficients of the polynomial, respectively. The coefficients are obtained by fitting the sigmoid function via least squares estimation.
Using this polynomial approximation we can rewrite (3) as,
(16) 
where is the quantized version of , and operates elementwise over the vector .
Another challenge is to ensure the convergence of weights. As we detail in Section 4, this necessitates the gradient estimations to be unbiased using the polynomial approximation with quantized weights. We solve this by utilizing the computation technique from Lemma in (Zhang et al., 2016) using the quantized weights formed in Section 3.1. Specifically, given a degree polynomial from (15) and independent quantizations from (10), we define a function,
(17) 
where the product operates elementwise over the vectors for . Lastly, we note that (17) is an unbiased estimator of ,
(18) 
where acts elementwise over the vector , and the result follows from the independence of quantizations.
The computations are then performed at each worker locally. In particular, at each iteration, worker locally computes :
(20) 
using and and sends the result back to the master. This computation is a polynomial function evaluation in finite field arithmetic and the degree of is .
3.4 Decoding and Model Update
After receiving the evaluation results in (20) from a sufficient number of workers, master decodes over the finite field. The minimum number of workers the master needs to wait for is termed the recovery threshold of the system and is equal to as we demonstrate in Section 4.
We now proceed to the details of decoding. By construction of the Lagrange polynomials in (3.2) and (3.2), one can define a univariate polynomial such that
(21) 
for . On the other hand, from (20), the computation result from worker equals to
(22) 
The main intuition behind the decoding process is to use the computations from (22) as evaluation points to interpolate the polynomial . Specifically, the master can obtain all coefficients of from evaluation results as long as . After is recovered, the master can recover (21) by computing for and evaluating
(23) 
4 Convergence and Privacy Guarantees
Consider the cost function (1) that we aim to minimize in logistic regression when dataset is replaced with the quantized dataset using (6). Also denote as the optimal weight vector that minimizes (1) when , where is row of . In this section we prove that CodedPrivateML would guarantee convergence to the optimal model parameters (i.e., ) while maintaining the privacy of the dataset against colluding workers.
Recall that the model update at the master node in CodedPrivateML follows (19), which is
(26) 
We first state a lemma, which is proved in Appendix A.2 in supplementary materials.
Lemma 1.
Let denote the gradient computation using the quantized weights in CodedPrivateML. Then we have

[topsep=0pt, partopsep=0pt, itemsep=0pt]

(Unbiasedness) Vector is an asymptotically unbiased estimator of the true gradient. , and as where is the degree of polynomial in (15) and expectation is taken with respect to the quantization errors,
We also need the following basic lemma, which is proved in Appendix A.3 of supplementary materials.
Lemma 2.
We now state our main theorem for CodedPrivateML.
Theorem 1.
Consider the training of a logistic regression model in a distributed system with workers using CodedPrivateML with dataset , initial weight vector , and constant step size (where is defined in Lemma 2). Then, CodedPrivateML guarantees,

[topsep=0pt, partopsep=0pt, itemsep=0pt]

(Convergence) in iterations, where is given in Lemma 1,

(Privacy) remains informationtheoretically private against any colluding workers, i.e., , , ,
as long as we have , where is the degree of the polynomial approximation in (15).
Remark 2.
Theorem 1 reveals an important tradeoff between privacy and parallelization in CodedPrivateML. The parameter reflects the amount of parallelization in CodedPrivateML, since the computation load at each worker node is proportional to th of the dataset. The parameter also reflects the privacy threshold in CodedPrivateML. Theorem 1 shows that, in a cluster with workers, we can achieve any and as long as . This condition further implies that, as the number of workers increases, the parallelization () and privacy threshold () of CodedPrivateML can also increase linearly, leading to a scalable solution.
Remark 3.
Theorem 1 also applies to the simpler linear regression problem. The proof follows the same steps.
Proof.
(Convergence) First, we show that the master can decode over the finite field as long as . As described in Sections 3.3 and 3.4, given the polynomial to approximation of the sigmoid function in (15), the degree of in (21) is a most . The decoding process uses the computations from workers as evaluation points to interpolate the polynomial . The master can obtain all coefficients of as long as the master collects at least evaluation results of . After is recovered, the master can decode the subgradient by computing for . Hence, the recovery threshold is given by to decode .
Next, we consider the update equation in CodedPrivateML (see (26)) and prove its convergence to . From the Lipschitz continuity of stated in Lemma 2, we have
where is the inner product. By taking the expectation with respect to the quantization noise on both sides,
(28)  
(29)  
(30)  
where (28) follows from , (29) from the convexity of , and (30) holds since and from Lemma 1 with assuming the arbitrarily large . Summing the above equations for , we have
Finally, since is convex, we observe that
which completes the proof of convergence.
(Privacy) Proof of privacy is deferred to Appendix A.4 in the supplementary materials.
5 Experiments
We now experimentally demonstrate the impact of CodedPrivateML, and make comparisons with existing cryptographic approaches to the problem. Our focus is on training a logistic regression model for image classification, while the computation load is distributed to multiple machines on the Amazon EC2 Cloud Platform.
Setup. We train the logistic regression model from (1) for binary image classification on the MNIST dataset (LeCun et al., 2010) to experimentally examine two things: the accuracy of CodedPrivateML and the performance gain in terms of training time. The size of dataset is ^{1}^{1}1To have a larger dataset we duplicate the MNIST dataset.. Experiments with additional dataset sizes are provided in Appendix A.6 of supplementary material.
We implement CodedPrivateML using the MPI4Py (Dalcín et al., 2005) message passing interface on Python. Computations are performed in a distributed manner on Amazon EC2 clusters using m3.xlarge machine instances.
We then compare CodedPrivateML with the MPCbased approach when applied to our problem. In particular, we implement a BGWstyle construction (BenOr et al., 1988) based on Shamir’s secret sharing scheme (Shamir, 1979) where we secret share the dataset among workers who proceed with a multiround protocol to compute the gradient. We further incorporate the quantization and approximation techniques introduced here as BGWstyle protocols are also bound to arithmetic operations over a finite field. See Appendix A.5 of supplementary materials for additional detail.
Protocol  Encode  Comm.  Comp.  Total run 
time (s)  time (s)  time (s)  time (s)  
MPC approach  845.55  49.51  3457.99  4304.60 
CodedPrivateML (Case 1)  50.97  3.01  66.95  126.20 
CodedPrivateML (Case 2)  90.65  6.45  110.97  222.50 
CodedPrivateML parameters. There are several system parameters in CodedPrivateML that should be set. Given that we have a bit implementation, we select the field size to be , which is the largest prime with bits to avoid the overflow on intermediate multiplication. We then optimize the quantization parameters, in (6) and in (9), by taking into account the tradeoff between the rounding and overflow error. In particular, we choose and . We also need to set the parameter , the degree of the polynomial for approximating the sigmoid function. We consider both and and as we show later empirically observe that a degree one approximation provides very good accuracy. We finally need to select (privacy threshold) and (amount of parallelization) in CodedPrivateML. As stated in Theorem 1, these parameters should satisfy . Given our choice of , we consider two cases:

[topsep=0pt, partopsep=0pt, itemsep=0pt]

Case 1 (maximum parallelization). All resources to parallelization by setting and ,

Case 2 (equal parallelization and privacy). The resources are split equally by setting ,
Training time. In the first set of experiments, we measure the training time while increasing the number of workers gradually. The results are demonstrated in Figure 2. We make the following observations. ^{2}^{2}2For , all schemes have almost same performance because they use same system parameters, .

[topsep=0pt, partopsep=0pt, itemsep=0pt]

CodedPrivateML provides substantial speedup over the MPC approach, in particular, up to and speedup in Cases 1 and 2, respectively. The breakdown of the total run time for one scenario is shown in Table 1. One can note that CodedPrivateML provides significant improvement in all three categories of dataset encoding and secret sharing; communication time between the workers and the master; and computation time. One reason for this is that, in MPCbased schemes, size of the secret shared dataset at each worker is the same as the original dataset, while in CodedPrivateML it is th of the dataset. This provides a large parallelization gain for CodedPrivateML. The other reason is the communication complexity of MPCbased schemes. We provide the results for more scenarios in Appendix A.6 of supplementary material.

We note that the total run time of CodedPrivateML decreases as the number of workers increases. This is again due to the parallelization gain of CodedPrivateML (i.e., increasing while increases). This parallelization gain is not achievable in MPCbased scheme, since the whole computation has to be repeated by all players who take part in MPC. We should however point out that MPCbased scheme could attain a higher privacy threshold (), while CodedPrivateML can achieve (Case 2).
Accuracy. We also examine the accuracy and convergence of CodedPrivateML in the experiments. Figure 3 illustrates the test accuracy of the binary classification problem between digits 3 and 7. With 25 iterations, the accuracy of CodedPrivateML with degree one polynomial approximation and conventional logistic regression are and , respectively. This result shows that CodedPrivateML guarantees almost the same level of accuracy, while being privacy preserving. Our experiments also show that CodedPrivateML achieves convergence with comparable rate to conventional logistic regression. Those results are provided in Appendix A.6 in the supplementary materials.
References
 Abadi et al. (2016) Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318, 2016.

BenOr et al. (1988)
BenOr, M., Goldwasser, S., and Wigderson, A.
Completeness theorems for noncryptographic faulttolerant
distributed computation.
In
Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing
, pp. 1–10. ACM, 1988.  Brinkhuis & Tikhomirov (2011) Brinkhuis, J. and Tikhomirov, V. Optimization: Insights and Applications. Princeton Series in Applied Mathematics. Princeton University Press, 2011.
 Chaudhuri & Monteleoni (2009) Chaudhuri, K. and Monteleoni, C. Privacypreserving logistic regression. In Advances in Neural Information Processing Systems, pp. 289–296, 2009.
 Chen et al. (2019) Chen, V., Pastro, V., and Raykova, M. Secure computation for machine learning with SPDZ. arXiv:1901.00329, 2019.
 Cover & Thomas (2012) Cover, T. M. and Thomas, J. A. Elements of information theory. John Wiley & Sons, 2012.
 Dahl et al. (2018) Dahl, M., Mancuso, J., Dupis, Y., Decoste, B., Giraud, M., Livingstone, I., Patriquin, J., and Uhma, G. Private machine learning in TensorFlow using secure computation. arXiv:1810.08130, 2018.
 Dalcín et al. (2005) Dalcín, L., Paz, R., and Storti, M. MPI for Python. Journal of Parallel and Distributed Computing, 65(9):1108–1115, 2005.
 Dwork et al. (2006) Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pp. 265–284. Springer, 2006.

Gascón et al. (2017)
Gascón, A., Schoppmann, P., Balle, B., Raykova, M., Doerner, J., Zahur, S.,
and Evans, D.
Privacypreserving distributed linear regression on highdimensional data.
Proceedings on Privacy Enhancing Technologies, 2017(4):345–364, 2017.  Gentry & Boneh (2009) Gentry, C. and Boneh, D. A fully homomorphic encryption scheme, volume 20. Stanford University, Stanford, 2009.

GiladBachrach et al. (2016)
GiladBachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., and
Wernsing, J.
Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy.
In International Conference on Machine Learning, pp. 201–210, 2016.  Graepel et al. (2012) Graepel, T., Lauter, K., and Naehrig, M. ML confidential: Machine learning on encrypted data. In International Conference on Information Security and Cryptology, pp. 1–21. Springer, 2012.

Han et al. (2019)
Han, K., Hong, S., Cheon, J. H., and Park, D.
Logistic regression on homomorphic encrypted data at scale.
ThirtyFirst Annual Conference on Innovative Applications of Artificial Intelligence (IAAI19), Available online:
https://daejunpark.github.io/iaai19.pdf, 2019.  Hesamifard et al. (2017) Hesamifard, E., Takabi, H., and Ghasemi, M. CryptoDL: Deep neural networks over encrypted data. arXiv:1711.05189, 2017.
 Jayaraman et al. (2018) Jayaraman, B., Wang, L., Evans, D., and Gu, Q. Distributed learning without distress: Privacypreserving empirical risk minimization. In Advances in Neural Information Processing Systems, pp. 6346–6357, 2018.
 Kim et al. (2018) Kim, A., Song, Y., Kim, M., Lee, K., and Cheon, J. H. Logistic regression model training based on the approximate homomorphic encryption. BMC Medical Genomics, 11(4):23–55, Oct 2018.
 LeCun et al. (2010) LeCun, Y., Cortes, C., and Burges, C. MNIST handwritten digit database. [Online]. Available: http://yann. lecun. com/exdb/mnist, 2, 2010.
 Li et al. (2017) Li, P., Li, J., Huang, Z., Gao, C.Z., Chen, W.B., and Chen, K. Privacypreserving outsourced classification in cloud computing. Cluster Computing, pp. 1–10, 2017.
 Lindell & Pinkas (2000) Lindell, Y. and Pinkas, B. Privacy preserving data mining. In Annual International Cryptology Conference, pp. 36–54. Springer, 2000.
 McMahan et al. (2018) McMahan, H. B., Ramage, D., Talwar, K., and Zhang, L. Learning differentially private recurrent language models. In International Conference on Learning Representations, 2018.
 Melis et al. (2019) Melis, L., Song, C., Cristofaro, E. D., and Shmatikov, V. Exploiting unintended feature leakage in collaborative learning. arXiv:1805.04049, 2019.
 Mohassel & Rindal (2018) Mohassel, P. and Rindal, P. ABY 3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 35–52, 2018.
 Mohassel & Zhang (2017) Mohassel, P. and Zhang, Y. SecureML: A system for scalable privacypreserving machine learning. In 38th IEEE Symposium on Security and Privacy, pp. 19–38. IEEE, 2017.

Nikolaenko et al. (2013)
Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., and Taft, N.
Privacypreserving ridge regression on hundreds of millions of records.
In IEEE Symposium on Security and Privacy, pp. 334–348. IEEE, 2013. 
Pathak et al. (2010)
Pathak, M., Rane, S., and Raj, B.
Multiparty differential privacy via aggregation of locally trained classifiers.
In Advances in Neural Information Processing Systems, pp. 1876–1884, 2010. 
Rajkumar & Agarwal (2012)
Rajkumar, A. and Agarwal, S.
A differentially private stochastic gradient descent algorithm for multiparty classification.
In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS’12), volume 22 of Proceedings of Machine Learning Research, pp. 933–941, La Palma, Canary Islands, Apr 2012.  Shamir (1979) Shamir, A. How to share a secret. Communications of the ACM, 22(11):612–613, 1979.
 Shokri & Shmatikov (2015) Shokri, R. and Shmatikov, V. Privacypreserving deep learning. In Proceedings of the 2015 ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321, 2015.
 Wagh et al. (2018) Wagh, S., Gupta, D., and Chandran, N. Securenn: Efficient and private neural network training. Cryptology ePrint Archive, Report 2018/442, 2018. https://eprint.iacr.org/2018/442.
 Wang et al. (2018) Wang, Q., Du, M., Chen, X., Chen, Y., Zhou, P., Chen, X., and Huang, X. Privacypreserving collaborative model learning: The case of word vector training. IEEE Transactions on Knowledge and Data Engineering, 30(12):2381–2393, Dec 2018.
 Yao (1982) Yao, A. C. Protocols for secure computations. In IEEE Annual Symposium on Foundations of Computer Science, pp. 160–164, 1982.
 Yu et al. (2019) Yu, Q., Raviv, N., Kalan, S. M. M., Soltanolkotabi, M., and Avestimehr, A. S. Lagrange coded computing: Optimal design for resiliency, security and privacy. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
 Yuan & Yu (2014) Yuan, J. and Yu, S. Privacy preserving backpropagation neural network learning made practical with cloud computing. IEEE Transactions on Parallel and Distributed Systems, 25(1):212–221, 2014.
 Zhang et al. (2016) Zhang, H., Li, J., Kara, K., Alistarh, D., Liu, J., and Zhang, C. The ZipML framework for training models with endtoend low precision: The cans, the cannots, and a little bit of deep learning. arXiv:1611.05402, 2016.
 Zhang et al. (2017) Zhang, H., Li, J., Kara, K., Alistarh, D., Liu, J., and Zhang, C. ZipML: Training linear models with endtoend low precision, and a little bit of deep learning. In Proceedings of the 34th International Conference on Machine Learning, pp. 4035–4043, Sydney, Australia, Aug 2017.
Appendix A Supplementary Materials
a.1 Algorithms
The overall procedure of the CodedPrivateML protocol is given in Algorithm 1. Procedures for individual phases are shown in Algorithms 25 for Sections 3.13.4, respectively.
Comments
There are no comments yet.