nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data

10/23/2018 ∙ by Fabian Boemer, et al. ∙ Intel 0

Homomorphic encryption (HE)--the ability to perform computations on encrypted data--is an attractive remedy to increasing concerns about data privacy in the field of machine learning. However, building models that operate on ciphertext is currently labor-intensive and requires simultaneous expertise in deep learning, cryptography, and software engineering. Deep learning frameworks, together with recent advances in graph compilers, have greatly accelerated the training and deployment of deep learning models to a variety of computing platforms. Here, we introduce nGraph-HE, an extension of the nGraph deep learning compiler, which allows data scientists to deploy trained models with popular frameworks like TensorFlow, MXNet and PyTorch directly, while simply treating HE as another hardware target. This combination of frameworks and graph compilers greatly simplifies the development of privacy-preserving machine learning systems, provides a clean abstraction barrier between deep learning and HE, allows HE libraries to exploit HE-specific graph optimizations, and comes at a low cost in runtime overhead versus native HE operations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background and significance

One of the key challenges in deploying machine learning (ML) at scale is how to help data owners learn from their data while protecting their privacy. This issue has become more pressing with the advent of regulations such as the General Data Protection Regulation Voigt & Von dem Bussche (2017). It might seem as though “privacy-preserving machine learning” would be a self-contradiction: ML wants data, while privacy hides data. One promising solution to this problem is known as homomorphic encryption (HE). Using HE, one can perform computation on encrypted data without ever decrypting it. Data owners can encrypt their data, send it to a data processor that has no access to the encryption key, and receive the answer to their query in encrypted form, which only the data owner can unlock.

The idea of HE dates back to 1978 Rivest et al. (1978), and theoretical breakthroughs occurred in 2009 Gentry & Boneh (2009)

to make the idea real but highly impractical. Further algorithmic breakthroughs have occurred since then, in tandem with the development of cryptosystems that are currently believed to be resistant to quantum computers, to yield HE schemes that map naturally onto vector addition and multiplication—the core of deep learning (DL) and other ML workloads. Recent work has shown the feasibility of evaluating convolutional neural networks using lattice-based HE cryptosystems 

Gilad-Bachrach et al. (2016), and the development of HE systems and data encoding schemes optimized for deep learning operations is an active research area Liu et al. (2017); Juvekar et al. (2018); Sanyal et al. (2018).

One of the biggest accelerators in the field of deep learning has been the development of software frameworks, like Tensorflow Abadi et al. (2016), MXNet Chen et al. (2015) and PyTorch Paszke et al. (2017), that allow users to describe deep learning networks and operations at a high level while hiding details of their software and hardware implementation. A key challenge for building large-scale, privacy-preserving ML systems using HE has been the lack of such a framework; as a result, data scientists face the formidable task of becoming experts in deep learning, cryptography, and software engineering.

In this work, we leverage recent work in graph compilers to overcome this challenge. Specifically, we present nGraph-HE, an HE backend to the nGraph compiler Cyphers et al. (2018), that allows data scientists to train networks on hardware of their choice, then easily deploy these models to HE cryptosystems that operate on encrypted data.

Note that in this paper we do not address specific changes either to HE or DL models per se in order to improve their performance. Instead, spurred by recent advances in both fields, we provide a clean abstraction barrier between DL and HE, as shown in Figure 1, so that each community can improve its own technologies as independently as possible with minimal changes to high-level code.

Figure 1: Overview of the nGraph-HE software stack. nGraph-HE currently supports SEAL and HEAAN, and it can be further extended to support more cryptosystems.

1.2 Contributions

  1. We describe a software framework for efficiently combining deep learning and homomorphic encryption. To our knowledge, we present the first use of a deep learning graph compiler to accelerate the development and deployment of privacy-preserving machine learning models.

  2. We demonstrate how this framework facilitates easy switching among multiple HE cryptosystems and describe the steps necessary for adding new HE schemes as they emerge.

  3. We provide examples of HE-specific optimizations for DL models that are enabled by automated reasoning over a computation graph.

  4. We demonstrate the framework on GEMM operations and on a convolutional neural network (CryptoNets) for varying levels of security using python and TensorFlow and verify that the runtime overhead imposed by the additional software layers is small ( of total runtime) compared to implementing these operations in C++ using HE libraries directly.

1.3 Homomorphic encryption and deep learning

What does it mean for a cryptosystem to be homomorphic? For a detailed review of homomorphic computing, we refer to reader to Acar et al. (2018). Informally, an encryption function and its decryption function are homomorphic with respect to a class of functions if for any function , we can construct a function such that for some set of that we care about111We are omitting the public and private keys that would also be arguments for the encryption and decryption functions. That is, for certain cryptosystems and target functions, it is possible to map a desired computation (the function ) on plaintext into a specific computation on ciphertext (the function ) whose result, when decrypted, matches the desired plaintext result. Figure 2 shows how this property enables a user, Alice, to perform inference on private data using a remote, untrusted computer. The remote machine receives a ciphertext from Alice (with no decryption key), executes a function on the ciphertext, then returns the result to Alice. Alice then decrypts the result to reveal the plaintext for . At no point does the remote machine gain access to Alice’s unencrypted data.

Figure 2: Simple model of secure inference via homomorphic encryption.

1.4 Challenges of crypto-secure deep learning

Mathematically, HE schemes are typically subject to several limitations:

Supported functions.

Some HE schemes only support a single algebraic operation, such as addition or multiplication. These are known as “partially homomorphic” schemes (PHE). Others schemes, called “fully homomorphic” (FHE), support two, such as addition and multiplication. Note that composing addition and multiplication suffices to construct polynomial functions, and hence polynomial approximations to non-polynomial functions like sigmoid or ReLU

222Going further, by building gates out of addition and multiplication over GF(2), one can in theory implement any boolean circuit, and hence any computable function..

Computational depth. HE schemes derived from Gentry’s original lattice-based system Gentry & Boneh (2009) rely on noise to hide plaintext. This encryption noise tends to accumulate with each homomorphic operation, and decryption becomes impossible if this noise exceeds a threshold. One common solution to this problem is to constrain the depth of the computation and set encryption parameters accordingly. Other solutions involve noise management techniques such as bootstrapping, which, depending on the HE scheme, may incur significant computational costs, but which can extend the computational depth indefinitely.

Number fields. Most HE schemes operate over integers Halevi & Shoup (2014); Laine (2018), while others use booleans Chillotti et al. (2016) or real numbers Cheon et al. (2017). One particular challenge in the case of integer-based schemes is scaling the magnitude of numbers by factors less than 1.

Computational and memory load. The cryptographic computations required to implement HE typically consume several orders of magnitude more CPU time and memory compared to their plaintext counterparts. These costs have long been the critique of homomorphic encryption. A detailed response to these critiques is out of the scope of this paper, but we note that there have been dramatic improvements in this area—for example, the runtime for homomorphic inference on the seminal CryptoNets MNIST example network has been reduced from 297.5 sec Gilad-Bachrach et al. (2016) to 1.28 sec Liu et al. (2017) in one year (although the latter uses a hybrid scheme; see Section 4).

From a software engineering point of view, there is additional complexity: there are multiple libraries for HE, based on multiple HE schemes, and with a variety of APIs (with some notable attempts to provide uniformity Rohloff (2018)). This diversity makes it difficult for developers to evaluate the tradeoffs of different schemes in the context of their specific applications.

1.5 The power of graph compilers

Deep learning frameworks, such as TensorFlow, MXNet and PyTorch, have greatly accelerated the development of deep learning models, allowing data scientists to express ML operations in high-level terms that can easily be ported from one platform to another (from a laptop to a cloud-based server, e.g.). Graph compilers, such as nGraph, have recently been developed to attack the challenge of optimizing framework performance on multiple hardware targets. Compiling high-level framework code into an intermediate representation (IR)—a computation graph—removes the need to generate optimized code for each (framework, hardware target) pair. Instead, in order to use a new hardware target for all frameworks, one only needs to develop optimized code for each operation in the computation graph. In addition, the graph can be an advantageous representation for reasoning about higher-level optimizations, such as fusing operations, vectorization, etc.

These advantages apply directly to expressing DL computations using HE. By treating HE schemes and the operations they support as instructions on a virtual machine Halevi & Shoup (2014), we can enable HE computations to a large set of DL frameworks while providing a clean separation between DL and HE technologies. Moreover, as in the case of deep learning, the graph representation facilitates various HE-specific optimizations, such as reducing computation depth and identifying opportunities for parallelism.

2 Implementation

We first describe the API adopted by nGraph-HE, as well as the mapping onto two currently supported cryptosystems: HEAAN Cheon et al. (2017) and SEAL Laine (2018). One difficulty in providing a unified framework for HE has been the variety of APIs and supported operations for various HE schemes. Following Archer et al. (2017), our API has two components: (1) a storage model, which contains the static parameters of the encryption scheme, and (2) an assembly language, which describes the functions implemented by the encryption scheme. We then discuss optimization techniques used in nGraph-HE. These techniques include HE-specific optimizations that exploit the capabilities of the underlying cryptosystems, as well as parallelization methods to reduce execution time. Lastly, we discuss debugging techniques and how to support additional cryptosystems.

2.1 Storage model

The storage model consists of both the cryptographic context and the payload representation. The cryptographic context stores the parameters of the cryptosystem, which consist of:

  • polynomial degree: the degree of the RLWE333The HE schemes we considered, and most known FHE schemes, are based on the ring learning with errors (RLWE) problem, which uses polynomials whose coefficients are members of a finite field. Brakerski & Vaikuntanathan (2011). polynomial.

  • plaintext modulus: the modulus of the RLWE polynomial coefficients.

  • security parameter: the security level in bits of the cryptosystem.

The payload representation consists of the parameters necessary for encryption and decryption. Specifically:

  • payload: the data to be encrypted, usually an integer or floating-point number.

  • encoding: the translation of the payload into a polynomial.

  • plaintext: the payload encoded as a polynomial.

  • encryption: the translation of a plaintext to the encrypted data.

  • ciphertext: the encrypted data.

  • decryption: the translation of a ciphertext to a plaintext.

  • decoding: the translation of a plaintext to a payload.

Figure 3: Relation between payload terms.

Figure 3 shows the relation between the terms in the payload representation. This overall abstraction strictly generalizes the standard (encryption, decryption) abstraction since the encoding and decoding functions can be identity mappings. This allows us to incorporate schemes that explicitly include encoding and decoding (SEAL, e.g.), as well as those where the encoding stage is built into the encryption function (HEAAN, e.g.).

The storage model can also include the public key or private key required for encryption and decryption, where appropriate.

2.2 Assembly language

We describe the assembly language of nGraph-HE in terms of nGraph operations. There are 5 low-level operations, which are typically supported by the APIs of HE cryptosystems:

  • add: operates on two ciphertexts, two plaintexts, or a ciphertext and a plaintext.

  • subtract: operates on two ciphertexts, two plaintexts, or a ciphertext and a plaintext.

  • multiply: operates on two ciphertexts, two plaintexts, or a ciphertext and a plaintext.

  • square: operates on a ciphertext or a plaintext. For some HE cryptosystems such as SEAL and HEAAN, square is an optimized multiply with the same operand. Depending on the cryptosystems, this optimization could be optional.

  • negate: operates on a ciphertext or a plaintext.

Based on these low-level operations, nGraph-HE provides efficient parallelized implementations for the following compound operations: sum, product, dot, convolution, and avg_pool. Optionally, developers can overwrite these default implementations with cryptosystem-specific optimizations.

In addition, nGraph-HE also provides implementations for the following tensor manipulation operations:

broadcast, concat, slice, replace_slice, reshape, reverse, and pad.

2.3 Optimizations

Special plaintext value bypass. Operations between a ciphertext and a plaintext may arise when either the model or the data are encrypted, but not both. When performing such operations, nGraph-HE detects special values in the plaintext and, when possible, bypasses the corresponding HE operations. These optimizations include:

  • ciphertext+plaintext(0): bypass HE operations and return the original ciphertext

  • ciphertext*plaintext(0): bypass HE operations and return a freshly-encrypted zero ciphertext, thereby resetting the noise budget

  • ciphertext*plaintext(1): bypass HE operations and return the original ciphertext

  • ciphertext*plaintext(-1): return the negation of the ciphertext, avoiding an expensive multiply operation

Bypassing HE operations not only reduces or resets encryption noise accumulation but also reduces runtime. One benefit of using a graph compiler is that higher-level compound ops, such as dot and convolution

, automatically inherit the benefits of these optimizations. For instance, in a binarized neural network with binary convolution kernels, applying a

convolution op will not invoke any calls to multiply. We demonstrate some of these runtime benefits quantitatively in Section 3.1.

SIMD packing. Some HE schemes (including SEAL and HEAAN) support “SIMD” operations Smart & Vercauteren (2011). In simple terms, a vector of payload values can be encoded and encrypted as a single ciphertext, and operations on this ciphertext are equivalent to same SIMD operation on the values in the vector individually. nGraph-HE utilizes SIMD capability across the mini-batch dimension. As shown in Section 3.2, running models with different mini-batch sizes (within the maximum allowed size) gives nearly-identical runtimes. This significantly increases the inference throughput of our system.

OpenMP parallelization. nGraph-HE makes extensive use of OpenMP for parallelization. It is used in data encryption and decryption, unary and binary element-wise operations, GEMM operations, convolution, and pooling. Different from SIMD packing, OpenMP parallelization is applied to non-mini-batch dimensions. For instance, to encrypt a batch of 1024 images with dimension , nGraph-HE encrypts the values at the first pixel location across all 1024 images as one ciphertext with SIMD packing, and does so for all 784 pixel locations in parallel with OpenMP, resulting in 784 ciphertexts in total. OpenMP parallelization significantly reduces the inference latency of our system.

2.4 Tracking noise accumulation

One bottleneck to designing HE deep learning models is tracking the accumulation of encryption noise. While some schemes, such as SEAL, track noise accumulation for each ciphertext, other schemes, such as HEAAN, do not. One way to track noise budget exhaustion in HEAAN is to verify that the results of each ciphertext operation match the results of the corresponding plaintext operation. This is, of course, only possible on privacy-insensitive training data. We implemented a debugging interface in nGraph-HE that enables checking the correctness of each ciphertext operation in this manner. This debugging interface also helps trace erroneous output values due to modulus overflow in HE schemes.

2.5 Adding a new cryptosystem

Currently, nGraph-HE supports two cryptosystems: SEAL and HEAAN. To support another cryptosystem, one simply needs to implement the storage model and the low-level operations in the assembly language calls described above. Most HE cryptosystems already include similar APIs, so the implementation is usually straightforward. For example, the add operation maps to Scheme::add or Scheme::addConst in HEAAN, and to Evaluator::add or Evaluator::add_plain in SEAL. As shown in Section 2.2, nGraph-HE provides abstractions for higher-level compound ops such as dot and convolution. They do not need to be re-implemented when adding a new cryptosystem.

3 Experimental results

We tested nGraph-HE on a dual-socket Intel Xeon Gold 6138T 2GHz system with 64GB of RAM running Ubuntu 16.04. We used HEAAN as the HE library for these measurements, although we have also tested nGraph-HE with the SEAL library. Note that our goal is not to provide comprehensive benchmarking of specific HE libraries or implementations of deep learning using HE. Rather, we report two main findings. First, we illustrate the dependence of matrix-matrix multiplication runtime on dimension and security level, and show an example of leveraging our compiler framework to implement HE-specific optimizations. Second, we demonstrate the ability to implement a convolutional neural network (the CryptoNets network Gilad-Bachrach et al. (2016)) using a popular deep learning framework (TensorFlow), and verify that the additional software layers through nGraph-HE to the underlying HE library impose minimal overhead. We also use the CryptoNets example to demonstrate another HE-specific optimization, namely, SIMD packing (Section 2.3).

3.1 GEMM operations

We first tested nGraph-HE on general matrix-matrix multiplication (GEMM) operations, since these form the backbone of deep learning workloads.

Figure 4: Runtime on GEMM operations as a function of matrix size, security level, and sparsity.

Figure 4 shows the runtime of computing , where , , and are matrices of random floating point numbers, and where is cyphertext while and are plaintexts. (This corresponds to a use case where, for example, contains a user’s data while and correspond to a model residing on a remote inference server.) To demonstrate two different levels of cryptographic security, we set the polynomial modulus to and , for approximately 80-bit and 256-bit security, with plaintext modulus and precision set to 155 bits and 30 bits Cheon et al. (2017), respectively, in both cases.

Figure 5: Runtimes on pre-compiled CryptoNets network with SIMD packing for different batch sizes and security levels. The runtime is composed of input encryption, model execution, and output decryption. Blue and red lines correspond to and , respectively.

To illustrate the power of enabling HE using graph compilers, we added an optimization to the (plaintext, ciphertext) multiplication op that short-circuits when the plaintext multiplicand is 0, and returns an encrypted value of 0 instead of carrying out the homomorphic multiplication (Section 2.3). We then measured the runtime savings by setting 50% and 80% of the matrix to 0 at random. These results correspond to the curves in Figure 4. Because multiplication is relatively expensive in most HE schemes compared to addition, the runtime gain is significant; the larger point, however, is that by providing HE in the context of a graph compiler, we enable developers to provide HE-specific optimizations to the back-end while data scientists continue to use the deep learning framework of their choice, treating HE as just another (virtual) hardware target.

3.2 Neural networks

Next, to demonstrate running a complete neural network on encrypted data, we implemented the original CryptoNets network444As in Gilad-Bachrach et al. (2016), for inference we squashed layers 3-6 into one linear layer in python with TensorFlow. One possible concern with adding software abstractions is the runtime overhead. To measure this, we timed the network in two ways. First, we executed the TensorFlow code with nGraph-HE as the backend. This incurs the overhead of TensorFlow, the nGraph-TensorFlow bridge, as well as nGraph IR compilation. Next, we timed the execution of a C++ application that loads the (pre-compiled) serialized network as nGraph IR and executes it directly.

Table 1 shows the runtimes of these experiments, running at the two security levels described in Section 3.1

. Note that the differences in times between the second and third columns (0.6 and 1.05 sec), which capture the overhead of graph compilation and bridging nGraph to TensorFlow, are less than the standard errors of the runtimes over 10 experiments, and represent less than 0.5% of overall runtime.

TF + nGraph-HE Direct nGraph-HE
136.36 1.11 135.76 0.96
323.70 2.47 322.65 2.28
Table 1: Runtimes on CryptoNets network with and without the overhead of TensorFlow integration and graph compilation. Times are in seconds and are reported in mean and standard error out of 10 experiments.

Another benefit of using a graph compiler with HE is that the computation graphs provide opportunities to identify parallelism that can be exploited by some HE schemes, for example, the ability to perform “SIMD” operations on vectors of payload values (Section 2.3). We implemented this capability and demonstrated it on the CryptoNets network. Figure 5 shows the CryptoNets inference runtime using batch sizes of 1 to 4096 at two levels of security for each batch size. We picked a maximum batch size of because the largest batch that HEAAN can use is . Note that while runtimes for preparing inputs and decrypting outputs increase slightly with input size, the model execution runtimes for larger batch sizes are nearly identical to the batch size of 1. Batching increases throughput significantly: for example, for the case, using a batch size of 4096 leads to an amortized runtime of 0.034 sec per image, compared to 136 sec for a batch size of 1 (see Table 1).

4 Extensions and future work

An additional benefit of using graph compilers in the context of HE is the ability to extract the computational (especially multiplicative) depth of the computation, since this is needed to set the security parameters of the HE scheme, such as the polynomial modulus. A useful extension of this work, therefore, would be to enable automatic selection of HE parameters at compile time as a function of desired security level. Another area for future work is to incorporate recent optimizations for matrix operations in HE Juvekar et al. (2018). Finally, we would like to extend this framework so that it can also include hybrid schemes that combine HE with multi-party communication (MPC); such hybrids have been shown Juvekar et al. (2018) to deliver much faster performance at the expense of higher communication costs. The optimal decomposition of a deep learning workload into HE and MPC stages could be determined at compile time and would be greatly facilitated by access to the underlying computation graph.

References