CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning

02/02/2019 ∙ by Jinhyun So, et al. ∙ 0

How to train a machine learning model while keeping the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize CodedPrivateML's privacy threshold and prove its convergence for logistic (and linear) regression. Furthermore, via experiments over Amazon EC2, we demonstrate that CodedPrivateML can provide an order of magnitude speedup (up to ∼ 34×) over the state-of-the-art cryptographic approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern machine learning models are breaking new ground by achieving unprecedented performance in various application domains. Training such models, however, is a daunting task. Due to the typically large volume of data and complexity of models, training is a compute and storage intensive task. Furthermore, training should often be done on sensitive data, such as healthcare records, browsing history, or financial transactions, which raises the issues of security and privacy of the dataset. This creates a challenging dilemma. On the one hand, due to its complexity, training is often desired to be outsourced to more capable computing platforms, such as the cloud. On the other hand, the training dataset is often sensitive and particular care should be taken to protect the privacy of the dataset against potential breaches in such platforms. This dilemma gives rise to the main problem that we study here: How can we offload the training task to a distributed computing platform, while maintaining the privacy of the dataset?

More specifically, we consider a scenario in which a data-owner (e.g., a hospital) wishes to train a logistic regression model by offloading the large volume of data (e.g., healthcare records) and computationally-intensive training tasks (e.g., gradient computations) to

machines over a cloud platform, while ensuring that any collusions between out of workers do not leak information about the training dataset. We focus on the semi-honest adversary setup, where the corrupted parties follow the protocol but may leak information in an attempt to learn the training dataset.

We propose CodedPrivateML for this problem, which has three salient features:

  1. [topsep=0pt, partopsep=0pt, itemsep=0pt]

  2. provides strong information-theoretic privacy guarantees for both the training dataset and model parameters in the presence of colluding workers.

  3. enables fast training by distributing the training computation load effectively across several workers.

  4. leverages a new method for secret sharing the dataset and model parameters based on coding and information theory principles, which significantly reduces the communication overhead and the complexity for distributed training.

At a high level, CodedPrivateML can be described as follows. It secret shares the dataset and model parameters at each round of the training in two steps. First, it employs stochastic quantization to convert the dataset and the weight vector at each round into a finite domain. It then combines (or

encodes) the quantized values with random matrices, using a novel coding technique named Lagrange coding (Yu et al., 2019)

, to guarantee privacy (in an information-theoretic sense) while simultaneously distributing the workload among multiple workers. The challenge is however that Lagrange coding can only work for computations that are in the form of polynomial evaluations. The gradient computation for logistic regression, on the other hand, includes non-linearities that cannot be expressed as polynomials. CodedPrivateML handles this challenge through polynomial approximations of the non-linear sigmoid function in the training phase.

Upon secret sharing of the encoded dataset and model parameters, each worker performs the gradient computations using the chosen polynomial approximation of the sigmoid function, and sends the result back to the master. It is useful to note that the workers perform the computations over the quantized and encoded data as if they were computing over the true dataset. That is, the structure of the computations are the same for computing over the true dataset versus computing over the encoded dataset.

Finally, the master collects the results from a subset of fastest workers and decodes the gradient over the finite field. It then converts the decoded gradients to the real domain, updates the weight vector, and secret shares it with the worker nodes for the next round. We note that since the computations are performed in a finite domain while the weights are updated in the real domain, the update process may lead to undesired behaviour as weights may not converge. Our system guarantees convergence through the proposed stochastic quantization technique while converting between real and finite fields.

We theoretically prove that CodedPrivateML guarantees the convergence of the model parameters, while providing information-theoretic privacy for the training dataset. Our theoretical analysis also identifies a trade-off between privacy and parallelization. More specifically, each additional worker can be utilized either for more privacy, by protecting against a larger number of collusions , or more parallelization, by reducing the computation load at each worker. We characterize this trade-off for CodedPrivateML.

Furthermore, we empirically demonstrate the impact of CodedPrivateML by comparing it with state-of-the-art cryptographic approaches based on secure multi-party computing (MPC) (Yao, 1982; Ben-Or et al., 1988), that can also be applied to enable privacy-preserving machine learning tasks (e.g. see (Nikolaenko et al., 2013; Gascón et al., 2017; Mohassel & Zhang, 2017; Lindell & Pinkas, 2000; Dahl et al., 2018; Chen et al., 2019)). In particular, we envision a master who secret shares its data and model parameters among multiple workers who collectively perform the gradient computation using a multi-round MPC protocol. Given our focus on information-theoretic privacy, the most relevant MPC-based scheme for empirical comparison is the BGW-style (Ben-Or et al., 1988) approach based on Shamir’s secret sharing (Shamir, 1979). While several more recent work design MPC-based private learning solutions with information-theoretic security (Wagh et al., 2018; Mohassel & Rindal, 2018), their constructions are limited to three or four parties.

We run extensive experiments over Amazon EC2 cloud to empirically demonstrate the performance of CodedPrivateML. We train a logistic regression model for image classification over the MNIST dataset (LeCun et al., 2010), while the computation workload is distributed to up to machines over the cloud. We demonstrate that CodedPrivateML can provide substantial speedup in training time (up to ), compared with MPC-based schemes, while guaranteeing the same level of accuracy. The primary disadvantage of the MPC-based scheme is its reliance on extensive communication and coordination between the workers for distributed private computing, and not benefiting from parallelization among the workers as the whole computation is repeated by all players who take part in MPC. They however guarantee a higher privacy threshold (i.e., larger ) compared with CodedPrivateML.

Other related works. Apart from MPC-based schemes to this problem, one can consider two other solutions. One is based on Homomorphic Encryption (HE) (Gentry & Boneh, 2009) which allows for computation to be performed over encrypted data, and has been used to enable privacy-preserving machine learning solutions (Gilad-Bachrach et al., 2016; Hesamifard et al., 2017; Graepel et al., 2012; Yuan & Yu, 2014; Li et al., 2017; Kim et al., 2018; Wang et al., 2018; Han et al., 2019). The privacy guarantees of HE are based on computational assumptions, whereas our system provides strong information-theoretic security. Moreover, HE requires computations to be performed over encrypted data which leads to many orders of magnitude slow down in training. For example, for image classification on the MNIST dataset, HE takes hours to learn a logistic regression model with accuracy (Han et al., 2019). In contrast, in CodedPrivateML there is no slow down in performing the coded computations which allows for a faster implementation. As a trade-off, HE allows collusion between a larger number of workers whereas in CodedPrivateML this number is determined by other system parameters such as number of workers and the computation load assigned to each worker.

Another possible solution is based on differential privacy (DP), which is a release mechanism that preserves the privacy of personally identifiable information, in that the removal of any single element from the dataset does not change the computation outcomes significantly (Dwork et al., 2006). In the context of machine learning, DP is mainly used for training when the model parameters are to be released for public use, to ensure that the individual data points from the dataset cannot be identified from the released model (Chaudhuri & Monteleoni, 2009; Shokri & Shmatikov, 2015; Abadi et al., 2016; Pathak et al., 2010; McMahan et al., 2018; Rajkumar & Agarwal, 2012; Jayaraman et al., 2018). The main difference between these approaches and our work is that we can guarantee strong information-theoretic privacy that leaks no information about the dataset, and preserve the accuracy of the model throughout the training. We note however that it is in principal possible to compose techniques of CodedPrivateML with differential privacy to obtain the best of both worlds if the intention is to publicly release the final model, but we leave this as future work.

2 Problem Setting

We study the problem of training a logistic regression model. The training dataset is represented by a matrix consisting of data points with features and a label vector . Row of is denoted by .

The model parameters (weights) are obtained by minimizing the cross entropy function,

(1)

where

is the estimated probability of label

being equal to and is the sigmoid function

(2)

The problem in (1) can be solved via gradient descent, through an iterative process that updates the model parameters in the opposite direction of the gradient. The gradient for (1) is given by . Accordingly, model parameters are updated as,

(3)

where holds the estimated parameters from iteration , is the learning rate, and function operates element-wise over the vector given by .

As shown in Figure 1, we consider a master-worker distributed computing architecture, where the master offloads the computationally-intensive operations to workers. These operations correspond to gradient computations in (3). In doing so, master wishes to protect the privacy of the dataset against any potential collusions between up to workers, where is the privacy parameter of the system.

Figure 1: The distributed training setup consisting of a master and worker nodes. The master shares with each worker a coded version of the dataset (denoted by ’s) and the current estimate of the model parameters (denoted by ’s) to guarantee the information-theoretic privacy of the dataset against any colluding workers. Workers perform computations locally over the coded data and send the results back to the master.

At the beginning of the training, dataset is shared in a privacy-preserving manner among the workers. To do so, is first partitioned into submatrices , for some . Parameter is related to the computation load at each worker (i.e., what fraction of the dataset is processed at each worker), as well as the number of workers the master has to wait for, to reconstruct the gradient at each step. The master then creates encoded submatrices, denoted by , by combining the parts of the dataset together with some random matrices to preserve privacy, and sends to worker . This process should only be performed once for the dataset .

At each iteration of the training, the master also needs to send to worker the current estimate of the model parameters (i.e., in (3)). However, it is recently shown that the intermediate model parameters can also leak substantial information about the dataset (Melis et al., 2019). The master also needs to prevent the leakage of these intermediate parameters. To that end, the master creates an encoded matrix to secret share the current estimate of model parameters with worker . This coding strategy should also be private against any colluding workers.

More specifically, the coding strategy that is used for secret sharing the dataset (i.e., creating ’s) and model parameters (i.e., creating ’s) should be such that any subset of colluding workers can not learn any information, in the strong information-theoretic sense, about the training dataset . Formally, for every subset of workers of size at most , we should have,

(4)

where denotes the mutual information, is the number of iterations, and is the collection of the coded matrices and coded parameter estimations stored at workers in . We refer to a protocol that guarantees privacy against colluding workers as a -private protocol.

At each iteration, worker performs its computation locally using and and sends the result back to the master. After receiving the results from a sufficient number of workers, the master recovers , reconstructs the gradients, and updates the model parameters using (3). In doing so, the master needs to wait only for the fastest workers. We define the recovery threshold of the protocol as the minimum number of workers the master needs to wait for. The relations between the recovery threshold and parameters , , and will be detailed in our theoretical analysis.

Remark 1.

Although our presentation is based on logistic regression, CodedPrivateML can also be applied to linear regression with minor modifications.

3 The Proposed CodedPrivateML Strategy

CodedPrivateML strategy consists of four main phases that are first described at a high-level below, and then with details in the rest of this section.

Phase 1: Quantization. In order to guarantee information-theoretic privacy, one has to mask the dataset and the weight vector in a finite field using uniformly random matrices, so that the added randomness can make each data point appear equally likely. In contrast, the dataset and weight vectors for the training task are defined in the domain of real numbers. We address this by employing a stochastic quantization technique to convert the parameters from the real domain to the finite domain and vice versa. Accordingly, in the first phase of our system, master quantizes the dataset and weights from the real domain to the domain of integers, and then embeds them in a field of integers modulo a prime . The quantized version of the dataset is given by . The quantization of the weight vector , on the other hand, is represented by a matrix , where each column holds an independent stochastic quantization of . This structure will be important in ensuring the convergence of the model. Parameter is selected to be sufficiently large to avoid wrap-around in computations. Its value depends on the bitwidth of the machine as well as the number of additive and multiplicative operations. For example, in a -bit implementation, we select (the largest prime with bits) as detailed in our experiments.

Phase 2: Encoding and Secret Sharing. In the second phase, the master partitions the quantized dataset into submatrices and encodes them using the recently proposed Lagrange coding technique (Yu et al., 2019), which we will describe in detail in Section 3.2. It then sends to worker a coded submatrix . As we will illustrate later, this encoding ensures that the coded matrices do not leak any information about the true dataset, even if workers collude. In addition, the master has to ensure the weight estimations sent to the workers at each iteration do not leak information about the dataset. This is because the weights updated via (3) carry information about the whole training set, and sending them directly to the workers may breach privacy. In order to prevent this, at iteration , master also quantizes the current weight vector to the finite field and encodes it again using Lagrange coding.

Phase 3: Polynomial Approximation and Local Computations. In the third phase, each worker performs the computations using its local storage and sends the result back to the master. We note that the workers perform the computations over the encoded data as if they were computing over the true dataset. That is, the structure of the computations are the same for computing over the true dataset versus computing over the encoded dataset. A major challenge is that Lagrange coding is designed for distributed polynomial computations. However, the computations in the training phase are not polynomials due to the sigmoid function. We overcome this by approximating the sigmoid with a polynomial of a selected degree . This allows us to represent the gradient computations in terms of polynomials that can be computed locally by each worker.

Phase 4: Decoding and Model Update. The master collects the results from a subset of fastest workers and decodes the gradient over the finite field. Finally, master converts the decoded gradients to the real domain, updates the weight vector, and secret shares it with workers for the next round.

We next provide the details of each phase. The overall algorithm of CodedPrivateML, and each of its four phases, are also presented in Appendix A.1 of supplementary materials.

3.1 Quantization

We consider an element-wise lossy quantization scheme for the dataset and weights. For quantizing the dataset , we use a simple deterministic rounding technique:

(5)

where is the largest integer less than or equal to . We define the quantized dataset as

(6)

where the rounding function from (5) is applied element-wise to the elements of matrix and

is an integer parameter that controls the quantization loss. Function

is a mapping defined to represent a negative integer in the finite field by using two’s complement representation,

(7)

Note that the domain of (6) is . To avoid a wrap-around which may lead to an overflow error, prime should be large enough, i.e., .

At each iteration, master also quantizes the weight vector from real domain to the finite field. This proves to be a challenging task as it should be performed in a way to ensure the convergence of the model. Our solution to this is a quantization technique inspired by (Zhang et al., 2017, 2016). Initially, we define a stochastic quantization function:

(8)

where is an integer parameter to control the quantization loss. is a stochastic rounding function:

The probability of rounding to is proportional to the proximity of to so that stochastic rounding is unbiased (i.e., ).

For quantizing the weight vector , the master creates independent quantized vectors:

(9)

where the quantization function (8) is applied element-wise to the vector and each denotes an independent realization of (8). The number of quantized vectors is equal to the degree of the polynomial approximation for the sigmoid function, which we will describe later in Section 3.3. The intuition behind creating

independent quantizations is to ensure that the gradient computations performed using the quantized weights are unbiased estimators of the true gradients. As detailed in Section 

4, this property is fundamental for the convergence analysis of our model. The specific values of parameters and provide a trade-off between the rounding error and overflow error. In particular, a larger value reduces the rounding error while increasing the chance of an overflow. We denote the quantization of the weight vector as

(10)

by arranging the quantized vectors from (9) in matrix form.

3.2 Encoding and Secret Sharing

The master first partitions the quantized dataset into submatrices , where for . It also selects distinct elements from . It then employs Lagrange coding (Yu et al., 2019) to encode the dataset. More specifically, it finds a polynomial of degree at most such that for , and  for , where ’s are chosen uniformly at random from  (the role of ’s is to mask the dataset and provide privacy against up to colluding workers). This is accomplished by letting  be the respective

Lagrange interpolation polynomial

(11)

Master then selects distinct elements  from such that , and encodes the dataset by letting for . By defining an encoding matrix whose element is given by , one can also represent the encoding of the dataset as,

(12)

At iteration , the quantized weights are also encoded using a Lagrange interpolation polynomial,

(13)

where for are chosen uniformly at random from . The coefficients are the same as the ones in (3.2). We note that the polynomial in (3.2) has the property for .

The master then encodes the quantized weight vector by using the same evaluation points . Accordingly, the weight vector is encoded as

(14)

for , using the encoding matrix from (12). The degree of the polynomials and are both .

3.3 Polynomial Approximation and Local Computation

Upon receiving the encoded (and quantized) dataset and weights, workers should proceed with gradient computations. However, a major challenge is that Lagrange coding is originally designed for polynomial computations, while the gradient computations that the workers need to do are not polynomials due to the sigmoid function. Our solution is to use a polynomial approximation of the sigmoid function,

(15)

where and denotes the degree and coefficients of the polynomial, respectively. The coefficients are obtained by fitting the sigmoid function via least squares estimation.

Using this polynomial approximation we can rewrite (3) as,

(16)

where is the quantized version of , and operates element-wise over the vector .

Another challenge is to ensure the convergence of weights. As we detail in Section 4, this necessitates the gradient estimations to be unbiased using the polynomial approximation with quantized weights. We solve this by utilizing the computation technique from Lemma  in (Zhang et al., 2016) using the quantized weights formed in Section 3.1. Specifically, given a degree polynomial from (15) and independent quantizations from (10), we define a function,

(17)

where the product operates element-wise over the vectors for . Lastly, we note that (17) is an unbiased estimator of ,

(18)

where acts element-wise over the vector , and the result follows from the independence of quantizations.

Using (17), we rewrite the update equations from (16) in terms of the quantized weights,

(19)

The computations are then performed at each worker locally. In particular, at each iteration, worker  locally computes :

(20)

using and and sends the result back to the master. This computation is a polynomial function evaluation in finite field arithmetic and the degree of is .

3.4 Decoding and Model Update

After receiving the evaluation results in (20) from a sufficient number of workers, master decodes over the finite field. The minimum number of workers the master needs to wait for is termed the recovery threshold of the system and is equal to as we demonstrate in Section 4.

We now proceed to the details of decoding. By construction of the Lagrange polynomials in (3.2) and (3.2), one can define a univariate polynomial such that

(21)

for . On the other hand, from (20), the computation result from worker equals to

(22)

The main intuition behind the decoding process is to use the computations from (22) as evaluation points to interpolate the polynomial . Specifically, the master can obtain all coefficients of from evaluation results as long as . After is recovered, the master can recover (21) by computing for and evaluating

(23)

Lastly, master converts (23) from the finite field to the real domain and updates the weights according to (19). This conversion is attained by the function,

(24)

where we let , and is defined as follows,

(25)

4 Convergence and Privacy Guarantees

Consider the cost function (1) that we aim to minimize in logistic regression when dataset is replaced with the quantized dataset using (6). Also denote as the optimal weight vector that minimizes (1) when , where is row of . In this section we prove that CodedPrivateML would guarantee convergence to the optimal model parameters (i.e., ) while maintaining the privacy of the dataset against colluding workers.

Recall that the model update at the master node in CodedPrivateML follows (19), which is

(26)

We first state a lemma, which is proved in Appendix A.2 in supplementary materials.

Lemma 1.

Let denote the gradient computation using the quantized weights in CodedPrivateML. Then we have

  • [topsep=0pt, partopsep=0pt, itemsep=0pt]

  • (Unbiasedness) Vector is an asymptotically unbiased estimator of the true gradient. , and as where is the degree of polynomial in (15) and expectation is taken with respect to the quantization errors,

  • (Variance bound)

    where and denote the -norm and Frobenius norm, respectively.

We also need the following basic lemma, which is proved in Appendix A.3 of supplementary materials.

Lemma 2.

The gradient of the cost function (1) with quantized dataset (as defined in (6)) is -Lipschitz with , i.e., for all we have

(27)

We now state our main theorem for CodedPrivateML.

Theorem 1.

Consider the training of a logistic regression model in a distributed system with workers using CodedPrivateML with dataset , initial weight vector , and constant step size (where is defined in Lemma 2). Then, CodedPrivateML guarantees,

  • [topsep=0pt, partopsep=0pt, itemsep=0pt]

  • (Convergence) in iterations, where is given in Lemma 1,

  • (Privacy) remains information-theoretically private against any colluding workers, i.e., , , ,

as long as we have , where is the degree of the polynomial approximation in (15).

Remark 2.

Theorem 1 reveals an important trade-off between privacy and parallelization in CodedPrivateML. The parameter reflects the amount of parallelization in CodedPrivateML, since the computation load at each worker node is proportional to -th of the dataset. The parameter also reflects the privacy threshold in CodedPrivateML. Theorem 1 shows that, in a cluster with workers, we can achieve any and as long as . This condition further implies that, as the number of workers increases, the parallelization () and privacy threshold () of CodedPrivateML can also increase linearly, leading to a scalable solution.

Remark 3.

Theorem 1 also applies to the simpler linear regression problem. The proof follows the same steps.

Proof.

(Convergence) First, we show that the master can decode over the finite field as long as . As described in Sections 3.3 and 3.4, given the polynomial to approximation of the sigmoid function in (15), the degree of in (21) is a most . The decoding process uses the computations from workers as evaluation points to interpolate the polynomial . The master can obtain all coefficients of as long as the master collects at least evaluation results of . After is recovered, the master can decode the sub-gradient by computing for . Hence, the recovery threshold is given by to decode .

Next, we consider the update equation in CodedPrivateML (see (26)) and prove its convergence to . From the -Lipschitz continuity of stated in Lemma 2, we have

where is the inner product. By taking the expectation with respect to the quantization noise on both sides,

(28)
(29)
(30)

where (28) follows from , (29) from the convexity of , and (30) holds since and from Lemma 1 with assuming the arbitrarily large . Summing the above equations for , we have

Finally, since is convex, we observe that

which completes the proof of convergence.

(Privacy) Proof of -privacy is deferred to Appendix A.4 in the supplementary materials.

5 Experiments

We now experimentally demonstrate the impact of CodedPrivateML, and make comparisons with existing cryptographic approaches to the problem. Our focus is on training a logistic regression model for image classification, while the computation load is distributed to multiple machines on the Amazon EC2 Cloud Platform.

Setup. We train the logistic regression model from (1) for binary image classification on the MNIST dataset (LeCun et al., 2010) to experimentally examine two things: the accuracy of CodedPrivateML and the performance gain in terms of training time. The size of dataset is 111To have a larger dataset we duplicate the MNIST dataset.. Experiments with additional dataset sizes are provided in Appendix A.6 of supplementary material.

We implement CodedPrivateML using the MPI4Py (Dalcín et al., 2005) message passing interface on Python. Computations are performed in a distributed manner on Amazon EC2 clusters using m3.xlarge machine instances.

We then compare CodedPrivateML with the MPC-based approach when applied to our problem. In particular, we implement a BGW-style construction (Ben-Or et al., 1988) based on Shamir’s secret sharing scheme (Shamir, 1979) where we secret share the dataset among workers who proceed with a multiround protocol to compute the gradient. We further incorporate the quantization and approximation techniques introduced here as BGW-style protocols are also bound to arithmetic operations over a finite field. See Appendix A.5 of supplementary materials for additional detail.

Figure 2: Performance gain of CodedPrivateML over the MPC-based scheme. The plot shows the total training time for accuracy ( iterations) for different number of workers in Amazon EC2 Cloud Platform.
Protocol Encode Comm. Comp. Total run
time (s) time (s) time (s) time (s)
MPC approach 845.55 49.51 3457.99 4304.60
CodedPrivateML (Case 1) 50.97 3.01 66.95 126.20
CodedPrivateML (Case 2) 90.65 6.45 110.97 222.50
Table 1: Breakdown of the total run time with workers.

CodedPrivateML parameters. There are several system parameters in CodedPrivateML that should be set. Given that we have a -bit implementation, we select the field size to be , which is the largest prime with bits to avoid the overflow on intermediate multiplication. We then optimize the quantization parameters, in (6) and in (9), by taking into account the trade-off between the rounding and overflow error. In particular, we choose and . We also need to set the parameter , the degree of the polynomial for approximating the sigmoid function. We consider both and and as we show later empirically observe that a degree one approximation provides very good accuracy. We finally need to select (privacy threshold) and (amount of parallelization) in CodedPrivateML. As stated in Theorem 1, these parameters should satisfy . Given our choice of , we consider two cases:

  • [topsep=0pt, partopsep=0pt, itemsep=0pt]

  • Case 1 (maximum parallelization). All resources to parallelization by setting and ,

  • Case 2 (equal parallelization and privacy). The resources are split equally by setting ,

Training time. In the first set of experiments, we measure the training time while increasing the number of workers gradually. The results are demonstrated in Figure 2. We make the following observations. 222For , all schemes have almost same performance because they use same system parameters, .

  • [topsep=0pt, partopsep=0pt, itemsep=0pt]

  • CodedPrivateML provides substantial speedup over the MPC approach, in particular, up to and speedup in Cases 1 and 2, respectively. The breakdown of the total run time for one scenario is shown in Table 1. One can note that CodedPrivateML provides significant improvement in all three categories of dataset encoding and secret sharing; communication time between the workers and the master; and computation time. One reason for this is that, in MPC-based schemes, size of the secret shared dataset at each worker is the same as the original dataset, while in CodedPrivateML it is -th of the dataset. This provides a large parallelization gain for CodedPrivateML. The other reason is the communication complexity of MPC-based schemes. We provide the results for more scenarios in Appendix A.6 of supplementary material.

  • We note that the total run time of CodedPrivateML decreases as the number of workers increases. This is again due to the parallelization gain of CodedPrivateML (i.e., increasing while increases). This parallelization gain is not achievable in MPC-based scheme, since the whole computation has to be repeated by all players who take part in MPC. We should however point out that MPC-based scheme could attain a higher privacy threshold (), while CodedPrivateML can achieve (Case 2).

Accuracy. We also examine the accuracy and convergence of CodedPrivateML in the experiments. Figure 3 illustrates the test accuracy of the binary classification problem between digits 3 and 7. With 25 iterations, the accuracy of CodedPrivateML with degree one polynomial approximation and conventional logistic regression are and , respectively. This result shows that CodedPrivateML guarantees almost the same level of accuracy, while being privacy preserving. Our experiments also show that CodedPrivateML achieves convergence with comparable rate to conventional logistic regression. Those results are provided in Appendix A.6 in the supplementary materials.

Figure 3: Comparison of the accuracy of CodedPrivateML (demonstrated for Case 2 and workers) vs conventional logistic regression that uses the sigmoid function without quantization. Accuracy is measured with MNIST dataset restructured for binary classification problem between 3 and 7 (using samples for the training set and samples for the test set).

References

  • Abadi et al. (2016) Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318, 2016.
  • Ben-Or et al. (1988) Ben-Or, M., Goldwasser, S., and Wigderson, A. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In

    Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing

    , pp. 1–10. ACM, 1988.
  • Brinkhuis & Tikhomirov (2011) Brinkhuis, J. and Tikhomirov, V. Optimization: Insights and Applications. Princeton Series in Applied Mathematics. Princeton University Press, 2011.
  • Chaudhuri & Monteleoni (2009) Chaudhuri, K. and Monteleoni, C. Privacy-preserving logistic regression. In Advances in Neural Information Processing Systems, pp. 289–296, 2009.
  • Chen et al. (2019) Chen, V., Pastro, V., and Raykova, M. Secure computation for machine learning with SPDZ. arXiv:1901.00329, 2019.
  • Cover & Thomas (2012) Cover, T. M. and Thomas, J. A. Elements of information theory. John Wiley & Sons, 2012.
  • Dahl et al. (2018) Dahl, M., Mancuso, J., Dupis, Y., Decoste, B., Giraud, M., Livingstone, I., Patriquin, J., and Uhma, G. Private machine learning in TensorFlow using secure computation. arXiv:1810.08130, 2018.
  • Dalcín et al. (2005) Dalcín, L., Paz, R., and Storti, M. MPI for Python. Journal of Parallel and Distributed Computing, 65(9):1108–1115, 2005.
  • Dwork et al. (2006) Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pp. 265–284. Springer, 2006.
  • Gascón et al. (2017) Gascón, A., Schoppmann, P., Balle, B., Raykova, M., Doerner, J., Zahur, S., and Evans, D.

    Privacy-preserving distributed linear regression on high-dimensional data.

    Proceedings on Privacy Enhancing Technologies, 2017(4):345–364, 2017.
  • Gentry & Boneh (2009) Gentry, C. and Boneh, D. A fully homomorphic encryption scheme, volume 20. Stanford University, Stanford, 2009.
  • Gilad-Bachrach et al. (2016) Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., and Wernsing, J.

    Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy.

    In International Conference on Machine Learning, pp. 201–210, 2016.
  • Graepel et al. (2012) Graepel, T., Lauter, K., and Naehrig, M. ML confidential: Machine learning on encrypted data. In International Conference on Information Security and Cryptology, pp. 1–21. Springer, 2012.
  • Han et al. (2019) Han, K., Hong, S., Cheon, J. H., and Park, D. Logistic regression on homomorphic encrypted data at scale.

    Thirty-First Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-19), Available online:

    https://daejunpark.github.io/iaai19.pdf
    , 2019.
  • Hesamifard et al. (2017) Hesamifard, E., Takabi, H., and Ghasemi, M. CryptoDL: Deep neural networks over encrypted data. arXiv:1711.05189, 2017.
  • Jayaraman et al. (2018) Jayaraman, B., Wang, L., Evans, D., and Gu, Q. Distributed learning without distress: Privacy-preserving empirical risk minimization. In Advances in Neural Information Processing Systems, pp. 6346–6357, 2018.
  • Kim et al. (2018) Kim, A., Song, Y., Kim, M., Lee, K., and Cheon, J. H. Logistic regression model training based on the approximate homomorphic encryption. BMC Medical Genomics, 11(4):23–55, Oct 2018.
  • LeCun et al. (2010) LeCun, Y., Cortes, C., and Burges, C. MNIST handwritten digit database. [Online]. Available: http://yann. lecun. com/exdb/mnist, 2, 2010.
  • Li et al. (2017) Li, P., Li, J., Huang, Z., Gao, C.-Z., Chen, W.-B., and Chen, K. Privacy-preserving outsourced classification in cloud computing. Cluster Computing, pp. 1–10, 2017.
  • Lindell & Pinkas (2000) Lindell, Y. and Pinkas, B. Privacy preserving data mining. In Annual International Cryptology Conference, pp. 36–54. Springer, 2000.
  • McMahan et al. (2018) McMahan, H. B., Ramage, D., Talwar, K., and Zhang, L. Learning differentially private recurrent language models. In International Conference on Learning Representations, 2018.
  • Melis et al. (2019) Melis, L., Song, C., Cristofaro, E. D., and Shmatikov, V. Exploiting unintended feature leakage in collaborative learning. arXiv:1805.04049, 2019.
  • Mohassel & Rindal (2018) Mohassel, P. and Rindal, P. ABY 3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 35–52, 2018.
  • Mohassel & Zhang (2017) Mohassel, P. and Zhang, Y. SecureML: A system for scalable privacy-preserving machine learning. In 38th IEEE Symposium on Security and Privacy, pp. 19–38. IEEE, 2017.
  • Nikolaenko et al. (2013) Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., and Taft, N.

    Privacy-preserving ridge regression on hundreds of millions of records.

    In IEEE Symposium on Security and Privacy, pp. 334–348. IEEE, 2013.
  • Pathak et al. (2010) Pathak, M., Rane, S., and Raj, B.

    Multiparty differential privacy via aggregation of locally trained classifiers.

    In Advances in Neural Information Processing Systems, pp. 1876–1884, 2010.
  • Rajkumar & Agarwal (2012) Rajkumar, A. and Agarwal, S.

    A differentially private stochastic gradient descent algorithm for multiparty classification.

    In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS’12), volume 22 of Proceedings of Machine Learning Research, pp. 933–941, La Palma, Canary Islands, Apr 2012.
  • Shamir (1979) Shamir, A. How to share a secret. Communications of the ACM, 22(11):612–613, 1979.
  • Shokri & Shmatikov (2015) Shokri, R. and Shmatikov, V. Privacy-preserving deep learning. In Proceedings of the 2015 ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321, 2015.
  • Wagh et al. (2018) Wagh, S., Gupta, D., and Chandran, N. Securenn: Efficient and private neural network training. Cryptology ePrint Archive, Report 2018/442, 2018. https://eprint.iacr.org/2018/442.
  • Wang et al. (2018) Wang, Q., Du, M., Chen, X., Chen, Y., Zhou, P., Chen, X., and Huang, X. Privacy-preserving collaborative model learning: The case of word vector training. IEEE Transactions on Knowledge and Data Engineering, 30(12):2381–2393, Dec 2018.
  • Yao (1982) Yao, A. C. Protocols for secure computations. In IEEE Annual Symposium on Foundations of Computer Science, pp. 160–164, 1982.
  • Yu et al. (2019) Yu, Q., Raviv, N., Kalan, S. M. M., Soltanolkotabi, M., and Avestimehr, A. S. Lagrange coded computing: Optimal design for resiliency, security and privacy. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
  • Yuan & Yu (2014) Yuan, J. and Yu, S. Privacy preserving back-propagation neural network learning made practical with cloud computing. IEEE Transactions on Parallel and Distributed Systems, 25(1):212–221, 2014.
  • Zhang et al. (2016) Zhang, H., Li, J., Kara, K., Alistarh, D., Liu, J., and Zhang, C. The ZipML framework for training models with end-to-end low precision: The cans, the cannots, and a little bit of deep learning. arXiv:1611.05402, 2016.
  • Zhang et al. (2017) Zhang, H., Li, J., Kara, K., Alistarh, D., Liu, J., and Zhang, C. ZipML: Training linear models with end-to-end low precision, and a little bit of deep learning. In Proceedings of the 34th International Conference on Machine Learning, pp. 4035–4043, Sydney, Australia, Aug 2017.

Appendix A Supplementary Materials

a.1 Algorithms

The overall procedure of the CodedPrivateML protocol is given in Algorithm 1. Procedures for individual phases are shown in Algorithms 2-5 for Sections 3.1-3.4, respectively.

0:  Dataset  
0:  Model parameters (weights)  
1:  (Master) Compute the quantized dataset using (6).
2:  (Master) Form the encoded matrices in (12).
3:  (Master) Send to worker .
4:  (Master) Initialize the weights .
5:  for iteration  do
6:     (Master) Find the quantized weights from (10).
7:     (Master) Encode into using (14).
8:     (Master) Send to worker .
9:     (Worker ) Compute from (20) and send the result back to the master.
10:     if Master received results from workers then
11:        (Master) Decode via polynomial interpolation from the received results.
12:     end if
13:     (Master) Compute in (23) and convert it from finite field to real domain using (24).
14:     (Master) Update the weight vector via (19).
15:  end for
16:  return
Algorithm 1 CodedPrivateML
0:  Dataset and weights  
0:  Quantized dataset and weights  
1:  (Master) Compute the quantized dataset from (6),
using function from (5) and from (7).
2:  (Master) Compute independent stochastic quantizations of vector given in (9),
by applying the quantization function (8) element-wise over the vector .
3:  (Master) Construct the quantized weight matrix in (10),
using the quantized vectors for .
4:  return and
Algorithm 2 Quantization
0:  Quantized dataset and weights  
0:  Encoded dataset and weights for  
1:  (Master) Partition the quantized dataset into submatrices .
2:  (Master) Construct the encoded matrices for as in (12) using the Lagrange polynomial from (3.2).
3:  (Master) Construct the encoded weights for as in (14) using the Lagrange polynomial from (3.2).
4:  (Master) Send and to worker , where .
Algorithm 3 Encoding and Secret Sharing
0:  Encoded dataset and weights for  
0:  Computation results for  
1:  (Master) Find the polynomial approximation coefficients from (15), by fitting the sigmoid function to a degree polynomial via least squares.
2:  (Master) Send the coefficients to all workers.
3:  (Worker ) Locally compute the function,
using and as given in (20), and send the result back to the master.
Algorithm 4 Polynomial Approximation and Local Computations
0:  Computation results