Cryptotree: fast and accurate predictions on encrypted structured data

06/15/2020 ∙ by Daniel Huynh, et al. ∙ 0

Applying machine learning algorithms to private data, such as financial or medical data, while preserving their confidentiality, is a difficult task. Homomorphic Encryption (HE) is acknowledged for its ability to allow computation on encrypted data, where both the input and output are encrypted, which therefore enables secure inference on private data. Nonetheless, because of the constraints of HE, such as its inability to evaluate non-polynomial functions or to perform arbitrary matrix multiplication efficiently, only inference of linear models seem usable in practice in the HE paradigm so far. In this paper, we propose Cryptotree, a framework that enables the use of Random Forests (RF), a very powerful learning procedure compared to linear regression, in the context of HE. To this aim, we first convert a regular RF to a Neural RF, then adapt this to fit the HE scheme CKKS, which allows HE operations on real values. Through SIMD operations, we are able to have quick inference and prediction results better than the original RF on encrypted data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Context. Many areas of Machine Learning have thrived these recent years, nonetheless some domains such as the financial or medical sectors have developed at a slower pace, as they often handle very sensitive data. While some Machine Learning services could greatly benefit society, allowing sensitive information to transit from the client to the server could lead to leaks, as the server could be malicious, or be compromised by a third party and private data could be exposed. Nonetheless, several solutions have emerged such as Secure Multi Party Computation [20], Trusted Execution Environments [14], and Homomorphic Encryption (HE). Each one has its own pros and cons, and throughout this paper we will focus on a solution using the HE paradigm.

HE is an encryption scheme, which allows data owners to encrypt their data, and let a third party perform computations on it, without knowing what is the underlying data. The result of the computations on encrypted data can then be sent back to the data owner, which will be the only one able to decrypt the encrypted result.

More formally, a ring homomorphism between two rings and , follows those two properties : and . This means that if we have an encryption homomorphism , a decryption homomorphism , such that , and a function , which is a composition of additions and multiplications, then we can have the following scenario :t he user encrypts her data using , and sends to an untrusted third party. Then the third party performs computations on the encrypted . Because is an homomorphism, we have that . The third party sends the data back to the user. Finally the user decrypts the output, obtaining then , without exposing her data directly to the untrusted third party.

HE schemes first started with [4], as RSA provided an homomorphic scheme, but only with homomorphic multiplication. Since then several HE schemes appeared [10],[6], [16], [15], [8] and while they allowed more and more operations, they remained too computationally heavy for real use. Recently, more practical schemes have emerged such as BGV [21] and BFV [7] for integer arithmetic, and CKKS [18] for arithmetic on complex numbers. In this paper we will use CKKS, a leveled homomorphic encryption scheme, which means that addition and multiplication are possible, but a limited number of multiplications is possible. This is due to the fact that noise is injected to the plaintext to hide it, but this noise grows during computation, and above a certain threshold the message cannot be decrypted correctly.

Because of those constraints, the evaluation of a Deep Neural Network (DNN) of arbitrary depth proves to be difficult: we have a limited number of multiplications, matrix multiplication works efficiently only on special cases, and non polynomial functions such as ReLU or sigmoid are difficult to approximate.

However, if we focus on the case of structured data, it is possible to find efficient and expressive models compatible with CKKS. [9] gave a very efficient way to model RFs as a DNN with two hidden layers and one output layer, called a Neural Random Forest (NRF). By leveraging this efficient representation of RFs and by tuning it, we show in this paper that one can adapt arbitrary RFs into Homomorphic Random Forests (HRF) which can do quick inference, and have similar performances, if not superior, to the original RF. Thus we show that efficient and expressive models can be used on encrypted data, therefore paving the way for more privacy friendly Machine Learning.

Our contribution. In this paper we show how Random Forests can be modeled efficiently by first converting them to Neural Random Forests, and then implementing their Homomorphic Random Forest counterparts under the CKKS scheme. We show that HRF can leverage the SIMD nature of CKKS to compute the predictions of each homomorphic tree at the same time, resulting in an efficient and expressive model. We then compare the performances of HRF on a real dataset, the Adult Income dataset, and show that HRF performs as well as their neural counterpart, and outperforms the original RF.

We provide a Python implementation of Homomorphic Random Forest in the Cryptotree library, which uses hooks provided by the TenSEAL library to interact with the C++ SEAL framework developed by Microsoft for HE schemes. Cryptotree provides a high level API so that users who do not have notions of HE can convert their Random Forest models to Neural Random Forests first, fine tune them, then convert these to Homomorphic Random Forests.

2 Preliminaries

2.1 CKKS’s scheme

message

plaintext

ciphertext

message

plaintext

ciphertext

encoding

encryption

computing

decryption

decoding
Figure 1: Overview of CKKS

We restate the CKKS [18] scheme here. As this is not the main topic of this paper, we will cover it briefly, underline its shortcomings and we will see how to overcome them when implementing Homomorphic Random Forests. For the interested reader, a more detailed introduction to CKKS is provided in the Appendix.

Figure 1 provides a high level view of CKKS. Let be a power of two, , , the -th cyclotomic polynomial of degree . For efficiency and security reasons, we will work with the ring integer of the -th cyclotomic polynomial. We can see that a message is first encoded into a plaintext polynomial then encrypted using a public key. Once the message is encrypted into which is a couple of polynomials, CKKS provides several operations that can be performed on it, such as addition, multiplication and rotation. While addition is pretty straightforward, multiplication has the particularity of increasing a lot the noise kept in the ciphertext, therefore to manage it, only a limited number of multiplications are allowed. Rotations are permutations on the slots of a given ciphertext. If we denote by a function which is a composition of homomorphic operations, then we have that decrypting with the secret key will yield . Therefore once we decode it, we will get .

Shortcomings.

While this introduction to CKKS was rather short, there are a few points to take into consideration when applying CKKS to Machine Learning. All inputs are represented as a vector

. If the actual dimension of the input is

, then it will be padded with zeros. This could be inefficient if operations are done on vectors of dimension

.

Moreover, only polynomial functions can be computed using the operations allowed in CKKS. While it is possible to approximate non polynomial functions, for instance using Chebyshev polynomials, only low degree polynomials can be evaluated, as CKKS is a leveled scheme, therefore a limited number of multiplications are possible. Moreover, a polynomial interpolation is viable only on a predefined domain, usually

for Chebyshev interpolation.

Besides, additions and multiplications are done element-wise on the encrypted inputs. This means that it becomes non trivial to perform simple linear operations such as summing all the coordinates of a vector, or doing a matrix multiplication between a plaintext matrix and a ciphertext vector.

We see that the last two points, combined with the leveled aspect of CKKS, where only a limited number of multiplications can be performed, make the evaluation of arbitrary DNN difficult as we would want to perform any kind of matrix multiplication, and use non polynomial activation functions.

Linear operations. Nonetheless [11] provides a way to do square matrix multiplication by using the diagonals of the matrix. Let denote the number of slots available, and , then we have .

where denotes the matrix multiplication, denotes the coefficient wise vector multiplication, is the -th diagonal of , and the rotation of by shifting slots to the left.

In practice in CKKS when doing matrix multiplication with a size , the input vector will be padded with zeros to size , and the same will happen to the diagonals. Therefore when doing the rotation, the first elements will be sent to the end, and zeros will come up and give wrong results. For instance, if we have , and , when we do a rotation of one slot we get therefore when we multiply the rotated input vector with the padded diagonal we will not get the correct result. A solution to this would be to first replicate the first coordinates of , yielding . Therefore each rotation from to will output the correct vector to be multiplied with the padded diagonal. Here is an example on vectors of size padded to :

We generalize this to do matrix multiplications at the same time on different inputs using only additions and multiplications (computations are detailed in the Appendix) :

Data:
Result:
1 ;
2 for  do
3       for  do
              ;
              // First we extract the diagonal
              ;
              // Then we pad it
4            
5       end for
       ;
        // We concatenate them
6       ;
7      
8 end for
return
Algorithm 1 PackedMatrixMultiplication

This algorithm is quite interesting as we can perform matrix multiplications of size at the constant cost of multiplications and additions as long as we have that . This particularly fits the context of RFs as would be the depth of the trees, which will is low because we use shallow trees, and is high because we have many trees.

Sum reduction of a vector of size can also be performed in a similar fashion. Therefore by combining element-wise multiplication and sum reduction, we can implement dot product in HE (more explanations can be found in the Appendix) :

Data:
Result:
 ;
  // We compute the number of steps
1 ;
2 z;
3 for  do
        ;
        // We rotate and add previous result
4       ;
5       ;
6      
7 end for
return
Algorithm 2 DotProduct

2.2 Neural Random Forests

0

1

2

3

4

5

6

7

8

9

10

0

1

3

6

8

2

4

5

7

9

10

Input layer

Output layer
Figure 2: An example of regression tree (top) and the corresponding neural network (down) from [9]

In this section we will see how RFs can be modeled using DNNs. This will serve as a basis for us, as Neural Random Forests have a nice structure, which will enable us to implement them efficiently in CKKS. Using trees to initialize DNNs has been already used in the past, as in [17], [1] or [13]. We will focus on the recent work of [9], which models RFs efficiently using a DNN with two hidden layers and one output layer.

We can see on Figure 2

how a DNN can simulate a decision tree. Imagine that an observation belongs to the leaf 4. Then based on the decision tree, this means that it started at the root, the node 0, then went left, went right of the node 1, and finally went left of the node 3. The idea of Neural Random Forests is to first perform all comparisons at the same time at the first layer, then determine which leaf the observation belongs to. Each neuron of the second layer represents a leaf, and only one neuron of the second layer will be activated. The neuron being activated will be the one representing the leaf where the observation lies, and this is done by using the computed comparisons of the first layer to determine exactly where the observation is. Finally, once the observation has been located, the final output layer will simply output the mean of this leaf for regression, or the distribution of this leaf for classification.

We will now see more formally how Neural Random Forests are modeled. Let , our normalized input space, and the output space for regression, or

the probability simplex for classification. We assume we are given dataset

, with . Let be a binary decision tree. We denote by the number of leaves, i.e. terminal nodes in the tree. Notice that if a binary decision tree has leaves, it has necessarily internal nodes, which represent the comparisons.

Let be an observation,

be the collection of hyperplanes used in the construction of

. For we have that , with , where is the index of the variable used in comparison , and is the threshold value of the comparison . Therefore the first linear layer simply applies comparisons, and apply a non linearity which will send to if the variable was above the threshold or otherwise. Thus the outputs of the first layer are :

(1)