1 Introduction
FullyHomomorphic Encryption (FHE) allows arbitrary computations on encrypted data without requiring the decryption key. Thus, FHE enables interesting privacypreserving capabilities, such as offloading secure storage and secure computation to untrusted cloud providers. Recent advances in FHE theory [13, 12] along with improved implementations have pushed FHE into the realm of practicality. For instance, when optimized appropriately, we can perform encrypted fixedpoint multiplications within a few microseconds, which matches the speed of 8086 processors that jumpstarted the computing revolution. Future cryptographic innovations will further reduce the performance gap between encrypted and unencrypted computations.
Despite the availability of multiple opensource implementations
[38, 26], programming FHE applications remains hard and requires cryptographic expertise, making it inaccessible to most programmers today. Furthermore, different FHE schemes provide subtly different functionalities and require manually setting various parameters that control correctness, performance, and security. We expect the programming complexity to only increase as future FHE schemes become more capable and performant. For instance, the recently invented CKKS scheme [13] supports fixedpoint arithmetic operations by representing real numbers as integers with a fixed scaling factor, but requires the programmer to perform rescaling operations so that scaling factors and the cryptographic noise do not grow exponentially due to multiplications. Moreover, the socalled RNSvariant of the CKKS scheme [12] provides efficient implementations that can use machinesized integer operations as opposed to multiprecision libraries, but imposes further restrictions on the circuits that can be evaluated on encrypted data.To improve the developer friendliness of FHE, this paper proposes a new general purpose language for FHE computation called Encrypted Vector Arithmetic (EVA). EVA is also designed to be an intermediate representation that is a backend for other domainspecific compilers. At its core, EVA supports arithmetic on fixedwidth vectors and scalars. The vector instructions naturally match the encrypted SIMD – or batching – capabilities of FHE schemes today. EVA includes an optimizing compiler that hides all the complexities of the target FHE scheme, such as encryption parameters and noise. It ensures that the generated FHE program is correct, performant, and secure. In particular, it eliminates all runtime errors that are common when programming FHE libraries manually.
EVA implements FHEspecific optimizations, such as optimally inserting operations like rescaling and modulus switching. EVA automatically reuses the memory used for encrypted messages, thereby reducing the memory consumed. We have built a compiler incorporating all these optimizations to generate efficient programs that run using the Microsoft SEAL [38] FHE library, which implements the RNSvariant of the CKKS scheme. We have also built an EVA executor that transparently parallelizes the generated program efficiently, allowing programs to scale well.
To demonstrate the usability of EVA, we have built a Python frontend for EVA. Using this frontend, we have implemented several applications in EVA with very few lines of code and much lesser complexity than in SEAL directly. We have implemented some statistical machine learning applications in EVA. Another application computes the length of a path in 3dimensional space, which can be used in secure fitness mobile applications. We have also implemented two image processing applications, Sobel filter detection and Harris corner detection, in EVA. We believe Harris corner detection is one of the most complex programs to be homomorphically evaluated.
In addition, we have built a domainspecific compiler on top of EVA for deep neural network (DNN) inference. This compiler takes programs written in a higherlevel language as input and generates EVA programs using a runtime of operations on higherlevel constructs like tensors and images. In particular, our DNN compiler subsumes the recently proposed domainspecific compiler called CHET
[17]. Our DNN compiler uses the same tensor kernels as CHET, except that it generates EVA programs instead of generating SEAL programs. Nevertheless, the optimizing compiler in EVA is able to outperform CHET in DNN inference by on average.In summary, EVA is a general purpose language and an intermediate representation that improves the programmability of FHE applications by guaranteeing correctness and security, while outperforming current methods.
The rest of this paper is organized as follows. Section 2 gives the background on fullyhomomorphic encryption. Section 3 presents the EVA language. Section 4 gives an overview of the EVA compiler. We then describe transformations and analysis in the compiler in Sections 5 and 6 respectively. Section 7 briefly describes the domainspecific compilers we built on top of EVA. Our evaluation is presented in Section 8. Finally, related work and conclusions are presented in Section 9 and 10.
2 Background and Motivation
In this section, we describe homomorphic encryption (Section 2.1) and the challenges in using it (Section 2.2). We also describe an implementation of homomorphic encryption (Section 2.3). Finally, we present the threat model assumed in this paper (Section 2.4).
2.1 FullyHomomorphic Encryption
An FHE scheme includes four stages, key generation, encryption, evaluation, and decryption. Most of the efficient FHE schemes, for example, BGV [6], BFV [18], and CKKS [13], are constructed on the Ring Learning with Errors (RLWE) problem [32]. At the time of key generation, a polynomial ring of degree with coefficients integers modulo must be chosen to represent ciphertexts and public keys according to the security standard [1]. We call the ciphertext modulus. A message or a vector of messages is encoded to a polynomial, and subsequently encrypted with a public key or a secret key to form a ciphertext consisting of two polynomials of degree up to . Encryption also adds to a ciphertext a small random error that is later removed in decryption.
Evaluation primarily includes four operations: addition of ciphertexts, addition of a ciphertext and a plaintext, multiplication of ciphertexts, and multiplication of a ciphertext and a plaintext. Decrypting (with a secret key) and decoding reveals the message, as if the computation was performed on unencrypted data.
2.2 Challenges in Using FHE
Programmers using FHE face significant challenges that must be overcome for correct, efficient, and secure computation. We discuss those challenges here to motivate our work.
Depth of Computation:
Computations on ciphertexts increase the initially small error in them linearly on the number of homomorphic additions and exponentially on the multiplicative depth of evaluation circuit. When the errors get too large, ciphertexts become corrupted and cannot be decrypted, even with the correct secret key. Thus, to support efficient homomorphic evaluation of a circuit, one must optimize the circuit for a lower depth. Furthermore, the multiplicative depth of the circuit also determines how large and must be, to ensure correct decryption while staying secure.
Relinearization:
After each multiplication of ciphertexts, the resulting ciphertext consists of three polynomials, while freshly encrypted ciphertexts consist of only two polynomials. To prevent ciphertext sizes from growing indefinitely, an operation called relinearization is performed to reduce the number of polynomials in a ciphertext back to two. Relinearization is costly and their optimal placement is an NPhard problem [10].
CKKS and Approximate FixedPoint:
The CKKS scheme introduced an additional challenge by only providing approximate results (but much higher performance in return). There are two main sources of error in CKKS: (i) error from the encoding of values to polynomials being lossy, and (ii) the noise added in every homomorphic operation being mixed with the message. To counter this, CKKS adopts a fixedpoint representation, which coupled with high enough scaling factors allows these errors to be hidden.
CKKS further features an operation called rescaling that scales down the fixedpoint representation of a ciphertext. Consider a ciphertext that contains the encoding of multiplied by the scale (a relatively low scale in CKKS). Its second power encode multiplied by the scale . Further powers would rapidly overflow modest values of the modulo , requiring impractically large encryption parameters to be selected. Rescaling the second power by will truncate the fixedpoint representation to encode the value at a scale of .
Rescaling has a secondary effect of also dividing the ciphertext’s modulus by the same divisor as the ciphertext itself. This means that there is a limited “budget” for rescaling built into the initial value of . The combined effect for CKKS is that can grow linearly with the multiplicative depth of the circuit. It is common to talk about the level of a ciphertext as how much is left for rescaling.
A further complication arises from the ciphertext after rescaling being encrypted under fundamentally different encryption parameters. To apply any binary homomorphic operations, two ciphertexts must be at the same level, i.e., have the same . Furthermore, addition and subtraction require ciphertexts to be encoded at the same scale due to the properties of fixedpoint arithmetic. CKKS also supports a modulus switching operation to bring down the level of a ciphertext without scaling the message. In our experience, inserting the appropriate rescaling and modulus switching operations to match levels and scales is a significantly difficult process even for experts in homomorphic encryption.
In the most efficient implementations of CKKS (so called RNSvariants [11]), the truncation is actually performed by dividing the encrypted values by prime factors of . Furthermore, there is a fixed order to these prime factors, which means that from a given level (i.e., how many prime factors are left in ) there is only one valid divisor available for rescaling. This complicates selecting points to rescale, as doing so too early might make the fixedpoint representation so small that the approximation errors destroy the message.
Encryption Parameters:
In CKKS, all of the concerns about scaling factors, rescaling, and managing levels are intricately linked with selecting encryption parameters. Thus, a typical workflow when developing FHE applications involves a lot of trialanderror, and repeatedly tweaking the parameters to achieve both correctness (accuracy) and performance. While some FHE libraries warn the user if the selected encryption parameters are secure, but not all of them do, so a developer may need to keep in mind securityrelated limitations, which typically means upperbounding for a given .
2.3 Microsoft SEAL
Microsoft SEAL [38] is a software library that implements the RNS variant of the CKKS scheme. In SEAL, the modulus is a product of several prime factors of bit sizes up to bits, and rescaling of ciphertexts is always done by dividing away these prime factors. The developer must choose these prime factors and order them correctly to achieve the desired rescaling behavior. SEAL automatically validates encryption parameters for correctness and security.
2.4 Threat Model
We assume a semihonest threat model, as is typical for homomorphic encryption. This means that the party performing the computation (i.e., the server) is curious about the encrypted data but is guaranteed to run the desired operations faithfully. This model matches for example the scenario where the server is trusted, but a malicious party has read access to the server’s internal state and/or communication between the server and the client.
3 EVA Language
The EVA framework uses a single language as its input format, intermediate representation, and executable format. Input programs use a subset of the language that omits details specific to homomorphic encryption, such as when to rescale. In this section, we describe this input format and its semantics, while Section 4 presents an overview of the compilation to an executable EVA program.
Table 1 lists the types that values in EVA programs may have. The vector types and have a fixed poweroftwo size for each input program. The poweroftwo requirement comes from the target encryption schemes.
We introduce some notation for talking about types and values in EVA. For , a literal value with elements is written or as a comprehension . For the th element of , we write . The concatenation of two values and is written . For the product type (i.e., tuple) of two EVA types and , we write , and write tuple literals as where and .
Type  Description 

An encrypted list of fixedpoint values.  
A list of 64bit floating point values.  
A 64bit floating point value.  
A 32bit signed integer. 
Programs in EVA are Directed Acyclic Graphs (DAGs), where each node represents a value available during execution. Nodes with one or more incoming edges are called instructions, which compute a new value as a function of its parameter nodes, i.e., the parent nodes connected to it. For the th parameter of an instruction we write and the whole list of parameter nodes is . Each instruction has an opcode , which specifies the operation to be performed at the node. Note that the incoming edges are ordered, as it corresponds to the list of arguments. Table 2 lists all the opcodes available in EVA. The first group are opcodes that frontends may generate, while the second group lists opcodes that are inserted by the compiler.
Opcode  Signature  Description  Format restrictions 

Negate  Negate each element of the argument.  
Add  Add arguments elementwise.  
Sub  Subtract right argument from left argument elementwise.  
Multiply  Multiply arguments elementwise  
RotateLeft  Rotate elements to the left by given number of indices.  
RotateRight  Rotate elements to the right by given number of indices.  
Relinearize  Apply relinearization (see Section 2).  Not in input  
ModSwitch  Switch to the next modulus in the modulus chain (see Section 2).  Not in input  
Rescale  Rescale the ciphertext with the given divisor (see Section 2).  Not in input 
A node with no incoming edges is called a constant if its value is available at compile time and an input if its value is only available at run time. For a constant , we write to denote the value. Inputs may be of any type, while constants can be any type except . This difference is due to the fact that the type is not fully defined before key generation time, and thus cannot have any values at compile time. The type is accessible as .
A program is a tuple , where is the length of all vector types in ; , and are list of all instruction, constant, and input nodes, respectively; and identifies a list of nodes as outputs of the program (i.e., ).
Next, we define execution semantics for EVA. Consider a dummy encryption scheme that instead of encrypting values just stores them as values. In other words, the encryption and decryption are the identity function. This scheme makes homomorphic computation very easy, as every plaintext operation is its own homomorphic counterpart. Given a map let be the function that recursively computes the value for node using plaintext semantics and using and for constants and inputs, respectively. Now for a program , we further define its reference semantic as a function , which given a value for each input node maps each output node in to its resulting value:
These execution semantics hold for any encryption scheme, except that output is also encrypted.
4 Overview of EVA Compiler
In this section, we briefly describe how to use the EVA compiler (Section 4.1). We then describe the constraints on the code generated by the EVA compiler (Section 4.2). Finally, we give an overview of the execution flow of the compiler (Section 4.3).
4.1 Using the Compiler
The EVA compiler takes a program in the EVA language as input. Along with the program, it needs the fixedpoint scales or precisions for each input in the program and the desired fixedpoint scales or precisions for each output in the program. The compiler then generates a program in the EVA language as output. In addition, it generates a vector of bit sizes that must be used to generate the encryption parameters as well as a set of rotation steps that must be used to generate the rotation keys. The encryption parameters and the rotations keys thus generated are required to execute the generated EVA program.
While the input and the output programs are in the EVA language, the set of instructions allowed in the input and the output are distinct, as listed in Table 2. The Relinearize, Rescale, and ModSwitch instructions require understanding the intricate details of the FHE scheme. Hence, they are omitted from the input program. Note that we can make these instructions optional in the input and the compiler can handle it if they are present, but for the sake of exposition, we assume that the input does not have these instructions.
The input scales and the desired output scales affect the encryption parameters, and consequently, the performance and accuracy of the generated program. Choosing the right values for these is a tradeoff between performance and accuracy (while providing the same security). Larger values lead to larger encryption parameters and more accurate but slower generated program, whereas smaller values lead to smaller encryption parameters and less accurate but faster generated program. Profiling techniques like those used in prior work [17] can be used to select the appropriate values.
4.2 Motivation and Constraints
EVA compiler can be generalized to support any batched FHE scheme. Nevertheless, in the rest of this paper, we present EVA compiler specifically for the RNS variant of the CKKS scheme [12]. We use the SEAL [38] implementation of this scheme as an example throughout the paper. Targeting EVA for the CKKS scheme [13] or the HEAAN library [26] would be straightforward.
There is a onetoone mapping between instructions in the EVA language (Table 2) and instructions in the RNSCKKS scheme, save Sum (more on that later). However, the input program cannot be directly executed. Firstly, encryption parameters are required to ensure that the program would be accurate. EVA can simply determine the bit sizes that is required to generate the parameters. However, this is insufficient to execute the program correctly because some instructions in the RNSCKKS scheme have restrictions on their inputs. If these restrictions are not met, the instructions would just throw an exception at runtime.
Each ciphertext in RNSCKKS has a coefficient modulus (vector of primes) and a fixedpoint scale associated with it. The following constraints apply for the binary instructions involving two ciphertexts in the RNSCKKS scheme:
(1) 
Equation 1 shows the constraints on the input ciphertexts for certain instructions in the RNSCKKS scheme; the other instructions do not have any constraints. In the rest of this paper, whenever we mention Add regarding constraints, it includes both Add and Sub.
We will use the following example in the rest of this section to illustrate the complications that arise due to the constraints on instructions. Consider a ciphertext and computation . Let have a scale and the desired output scale be . After Multiply, gets a scale of . Consequently, Add is now trying to add ciphertexts with different scales, which would yield an exception.
One way to enforce that scales of the two operands of Add match is to multiply the operand with the lower scale and a constant 1 with the appropriate scale so that the product of the two scales yields the higher scale. In the example, if the computation is transformed to such that has the scale , then both operands of Add will have the same scale. Even though this is a feasible strategy, this may be insufficient for generating efficient code.
Without the use of Rescale instructions, the scales and the noise of the ciphertexts would grow exponentially with the multiplicative depth of the program and consequently, the product of the coefficient modulus required for the input would grow exponentially. Instead, using Rescale instructions ensures that they would only grow linearly with the multiplicative depth of the program. In the example, the output of has a scale of . This requires the coefficient modulus of to be at least ^{1}^{1}1In SEAL, if the coefficient modulus is , then is a prime close to a powerof2. EVA compiler (and the rest of this paper) assumes is the corresponding powerof2 instead. To resolve this discrepancy, when a Rescale instruction divides the scale by the prime, the scale is adjusted (by the EVA executor) as if it was divided by the powerof2 instead., where the last is for the desired output scale. Instead, if the output of is rescaled, then would have a scale of and can be added to directly (without adding another multiplication). Thus, the output of has a scale of . This requires the coefficient modulus of to be at least ; the first would be consumed by the rescale.
Insertion of Rescale instructions may lead to violating the constraints of other instructions. In the transformed program with as the coefficient modulus of , the two operands of Add have coefficient modulus of amd because Rescale would have consumed in the first operand. This violates the constraint that the coefficient modulus of Add operands must match. To resolve this, we can insert ModSwitch before the second operand, which just consumes without changing the scale (unlike Rescale). Thus, the transformed program would be correct and efficient.
For the sake of exposition, we omitted a few constraints in Equation 1. Firstly, another constraint on Multiply is that all ciphertext operands of Multiply must only have 2 polynomials. A Multiply of 2 ciphertexts results in 3 polynomials. Relinearize of this ciphertext yields a ciphertext with 2 polynomials. This must be done before the next Multiply. Secondly, the scalar operand for a Rescale must be .
To summarize, FHE schemes (or libraries) are tedious for a programmer to reason about, due to all their cryptographic constraints. Programmers find it even more tricky to satisfy the constraints in a way that optimizes performance. The EVA compiler hides such cryptographic details from the programmer while optimizing the program.
4.3 Execution Flow of the Compiler
As mentioned in Section 3, the inmemory internal representation of the EVA compiler is an Abstract Semantic Graph, also known as a Term Graph, of the input program. In the rest of this paper, we will use the term graph to denote an Abstract Semantic Graph. In this inmemory representation, each node can access both its parents and its children, and for each output, a distinct leaf node as added a child. It is straightforward to construct the graph from the EVA program and viceversa, so we omit the details. We use the terms program and graph interchangeably in the rest of the paper.
Algorithm 1 presents the execution flow of the compiler. There are four main steps, namely transformation, validation, parameters selection, and rotations selection. The transformation step takes the input program and modifies it to satisfy the constraints of all instructions, while optimizing it. In the next step, the transformation program is validated to ensure that no constraints are violated. If any constraints are violated, then the compiler throws an exception. By doing this, the compiler ensures that executing the output program will never lead to a runtime exception in the FHE library. Finally, for the validated output program, the compiler selects the bit sizes and the rotation steps that must be used to determine the encryption parameters and the rotation keys respectively, before executing the output program. The transformation step involves rewriting the graph, which is described in detail in Section 5. The other steps only involve traversal of the graph (without changing it), which is described in Section 6.
5 Transformations in EVA Compiler
In this section, we describe the key graph transformations in the EVA compiler. We first describe a general graph rewriting framework (Section 5.1). Then, we describe three graph transformation passes (Sections 5.2 and 5.3).
5.1 Graph Rewriting Framework
A graph transformation can be captured succinctly using graph rewriting rules (or term rewriting rules). These rules specify the transformation of a subgraph (or an expression) and the graph transformation consists of transforming all applicable subgraphs (or expressions) in the graph (or program). In other words, the rewriting rules specify local operations on a graph. and the graph transformation itself is composed of applying these local operations wherever needed. The order in which these local operations are applied may impact the correctness or efficiency of the transformation.
The nodes in the graph have readonly properties like the opcode and number of parents. In a graph transformation, some state or data may be stored on each node in the graph and the rewriting rules may read and update the state. Moreover, the rewriting rules may be conditional on the state and properties of the nodes in the subgraph. Depending on the conditions, the rewriting rules may require (or prefer) to be applied in a particular order. Consider two orders: (1) forward pass from roots to leaves of the graph, or (2) backward pass from leaves to roots of the graph. In forward pass, state (or data) flows from parents to children. Similarly, in backward pass, state (or data) flows from children to parents. In general, multiple forward or backward passes may be needed to apply the rewriting rules until quiescence (no change), but a single forward or backward pass might suffice.
EVA includes a graph rewriting framework for arbitrary rewriting rules for a subgraph that consists of a node along with its parents or children. Thus, EVA supports rewriting rules that specify how to transform a node and its neighbors. EVA supports both forward and backward passes. In the forward pass, a node is scheduled for rewriting only after all its parents have already been rewritten (note that the rewriting operation may not do any modifications if its condition does not hold). Similarly in backward pass, a node is scheduled for rewriting only after all its children have already been rewritten. In all graph transformations in EVA, a single forward or backward pass is sufficient.
In summary, a graph transformation consists of (1) the state on each node, (2) whether it is a forward pass or a backward pass, and (3) the graph rewriting rules. The graph rewriting rules consist of (1) the conditions on a subgraph consisting of a node and its neighbors, (2) the updates to the state, and (3) the transformation of the subgraph.
5.2 Relinearize Insertion Pass
Each ciphertext is represented as 2 or more polynomials. When two ciphertexts each with 2 polynomials are multiplied, it yields a ciphertext with 3 polynomials. SEAL does not support multiplication of a 3 polynomials ciphertext with another ciphertext, plaintext, or scalar. The Relinearize instruction reduces a ciphertext from 3 polynomials to 2 polynomials. Thus, EVA must insert this instruction after Multiply of two nodes and before another Multiply.
The relinearization insertion pass requires no state on any node and can be implemented using either a forward pass or a backward pass. The rewriting rule is applied for a node only if it is a Multiply operation and if both its parents (or operands) have type. The transformation in the rule inserts a Relinearize node between the node and its children. In other words, the new children of will be only and the children of will be the old children of .
This pass eagerly inserts Relinearize instructions soon after the appropriate Multiply. There is a variant of the pass that lazily inserts Relinearize instruction before the appropriate Multiply. We implemented this variant, but omit its description for simplicity.
5.3 Rescale and ModSwitch Insertion Passes
Goal:
The Rescale and ModSwitch nodes (or instructions) must be inserted such that they satisfy the constraints in Equation 1. There are two problems: (1) scales of parents (or operands) of Add must match, and (2) coefficient moduli of parents of Add and Multiply must match. As described in Section 4.2, it is easy to resolve the first issue by adding a multiplication of one of the parents with the appropriate scale. We take this simple approach for matching scales and omit the details due to lack of space. The main problem is in resolving the second issue. The goal of the Rescale and ModSwitch insertion passes is to insert them such that the coefficient moduli of the parents of any Add and Multiply node are equal.
While satisfying the constraints is sufficient for correctness, different choices lead to different coefficient modulus , and consequently, different polynomial modulus for the roots (or inputs) to the graph (or program). Larger values of and increase the cost of every FHE operation and the memory of every ciphertext. is a nondecreasing function of (i.e., if grows, either remains the same or grows as well). Minimizing both and is a hard problem to solve. However, reducing is only impactful if it reduces , which is unlikely as the threshold of , for which increases, grows exponentially. Therefore, the goal of EVA is to get the optimal , which may or may not yield the optimal .
ConstrainedOptimization Problem:
The only nodes that modify the coefficient modulus are Rescale and ModSwitch nodes; that is, they are the only ones whose output ciphertext has a different coefficient modulus than that of their input ciphertext(s). Therefore, the coefficient modulus of the output of a node depends only on the Rescale and ModSwitch nodes in the path from the root to that node. To illustrate their relation, we define the term rescale chain.
Definition 1
Given a directed acyclic graph G = (V, E):
For , is a parent of if .
A node is a root if and such that is a parent of .
Definition 2
Given a directed acyclic graph G = (V, E):
A path to a node is a sequence of nodes such that is a root, , and and is a parent of .
A path to a node is said to be simple if and .
Definition 3
Given a directed acyclic graph G = (V, E):
A rescale path to a node is a sequence of nodes such that ( or ), a simple path from a root to , a simple path from to , ( a simple path from to ), and ( and ).
A rescale chain of a node is a vector such that a rescale path and () and (). Note that is used here to distinguish ModSwitch from Rescale in the rescale chain.
A rescale chain of a node is conforming if rescale chain of , or or .
Note that all the roots in the graph have the same coefficient modulus. Therefore, for nodes and , the coefficient modulus of the output of is equal to that of if and only if there exists conforming rescale chains for and , and the conforming rescale chain of is equal to that of . Thus, Rescale and ModSwitch insertion passes aim to solve two problems simultaneously:

Constraints: Ensure the conforming rescale chains of the parents of any Multiply or Add node are equal.

Optimization: Minimize the length of the rescale chain of every node.
Outline:
In general, the constraints problem can be solved in two steps:

Insert Rescale in a pass (to reduce exponential growth of scale and noise).

Insert ModSwitch in another pass so that the constraints are satisfied.
The challenge is in solving this problem in this way, while yielding the desired optimization.
Always Rescale Insertion:
A naive approach of inserting Rescale is to insert it after every Multiply. We call this approach as always rescale. This may lead to a larger coefficient modulus (both in the number of elements and their product). For example, consider ciphertexts and with a scale of and respectively, and say the computation is . If Rescale is inserted after each multiplication, then and have a scale of and respectively. To Add them, the scales must match and this can be resolved easily (like in Section 4.2). In addition, their coefficient modulus must match, and consequently, their rescale chains must match. This can be achieved by inserting ModSwitch nodes appropriately (before and after ) so that the rescale chains for both are . For deeper circuits, this would require multiple passes and lead to much longer conforming rescale chains than the multiplicative depth of the graph (i.e., maximum number of Multiply nodes in any path).
Insight:
Consider that all the roots in the graph have the same scale . Then, in the always rescale approach, the only difference between the rescale chains of a node would be their length and not the values in it. A conforming rescale chain for can be obtained by adding ModSwitch nodes in the smaller chain(s). Thus, the length of the conforming rescale chain of a node would not be greater than the multiplicative depth of . This is possible because all Rescale nodes rescale by the same value . The first key insight of EVA is that it is sufficient to use the same rescale value for all Rescale nodes to give a tight bound on the length of the conforming rescale chain.
The multiplicative depth of a node is not necessarily the minimum length of its conforming rescale chain. For example, for the computation , one can rescale soon after or after depending on the scale of and the allowable rescale values in the FHE schemes. The second key insight of EVA is that the length of the conforming rescale chain is optimal (or minimal) if the largest allowed rescale value (which is in SEAL) is used in all Rescale nodes.
Waterline Rescale Insertion:
Based on our insights, the value to rescale is fixed to the maximum allowed value, which is denoted by . That does not address the question of when to insert Rescale nodes. It is correct to insert a Rescale node only if the resulting scale (after rescale) is above a threshold or waterline. As different roots could have different scales, we choose the waterline to be maximum of all their values and denote this by . Consider a Multiply node whose scale after multiplication is . Then, a Rescale in inserted between and its children only if . We call this approach as waterline rescale. It is optimal in minimizing .
The Rescale insertion transformation is a forward pass that maintains a scale on every node . The scale is updated as it would be during encrypted execution; Multiply multiplies the scale, Rescale divides the scale, and the rest copy the scale from their parents. The pass includes the above rewriting rule. It also includes another rewriting rule for Add to insert a Multiply after one of its operands only if their scales do not match.
This pass eagerly inserts Rescale instructions soon after the appropriate Multiply. There is a variant of the pass that lazily inserts Rescale instruction before the appropriate Multiply. We implemented this variant, but omit its description for simplicity.
ModSwitch Insertion:
After the waterline rescale insertion pass, a naive way to insert ModSwitch is to determine whether the conforming rescale chains of the parents of a Add or a Multiply node match and if they do not, then insert the appropriate number of ModSwitch nodes between one of the parents and the node. This is sufficient to enforce the constraints. However, as it inserts ModSwitch just before it is needed, the parents might be using a higher coefficient modulus than required, leading to inefficient computation. Thus, we call this lazy insertion.
If ModSwitch is inserted earlier in the graph, then all its children will be faster due to a smaller coefficient modulus. We call inserting it at the earliest feasible edge in the graph as eager insertion. This graph transformation is a backward pass. It maintains a reverselevel on each node , which denotes the number of Rescale or ModSwitch nodes in all paths from to leaves in the graph; Rescale and ModSwitch nodes increment , while the rest copy from their children. If children’s do not match, then appropriate ModSwitch nodes are inserted between and the applicable children so that of all children of match. This ensures that all rescale chains in the transpose of the graph are conforming, which implies that all rescale chains in the graph are conforming.
6 Analysis in EVA Compiler
In this section, we describe a general graph traversal framework (Section 6.1) and briefly describe a few analysis passes (Section 6.2). The graph traversal framework is also used to implement an executor for the generated EVA program but we omit its description due to lack of space.
6.1 Graph Traversal Framework
EVA’s graph traversal framework allows either a forward traversal or a backward traversal of the graph. In the forward traversal pass, a node is visited only after all its parents are visited. Similarly, in the backward traversal pass, a node is visited only after all its children are visited. Graph traversals do not modify the structure of the graph, unlike graph rewriting. Nonetheless, a state on each node can be maintained during the traversal. A single pass is sufficient to perform forward or backward dataflow analysis of the graph because the graph is acyclic. Execution of the graph is a forward traversal of the graph, so uses the same framework. A graph traversal pass can be succinctly captured by: (1) the state on each node (and its initial state), (2) whether it is a forward or backward pass, and (3) the state update for each EVA node (or instruction) based on its parents (or children) in forward (or backward) pass (similar to dataflow equations).
Parallel Implementation:
A node is said to be ready or active if all its parents (or children) in forward (or backward) pass have already been visited. These active nodes can be scheduled to execute in parallel as each active node only updates its own state (i.e., there are no conflicts). We implement such a parallel graph traversal in EVA using the Galois [35, 19] parallel library.
6.2 Analysis Passes
Validation Passes:
We implement two passes to validate that the constraints in Section 4.2 are satisfied. Both are forward passes. In the first pass, we maintain a scale on each node; Multiply multiplies the scale, Rescale divides the scale, and the rest copy the scale from their parents. The pass asserts that both the parents of any Add node have the same scale. In the second pass, we maintain the rescale chain (vector of values) on each node; Rescale node add its rescale value to, ModSwitch node adds to, and the rest copy from the rescale chain of their parents. The pass asserts that both the rescale chains of parents of any Add or Multiply node are conforming. If the assertions failed, the pass fails and an exception in thrown at compiletime. The validation passes thus elide runtime exceptions in SEAL.
Encryption Parameter Selection Pass:
Similar to encryption selection in CHET [17], the encryption parameter selection pass in EVA maintains the conforming rescale chain and the scale on each node. After the traversal, for each leaf, the leaf’s scale and desired output scale are multiplied and the maximum one is chosen among them, denoted as . Among the conforming rescale chains of the leaves in the graph, the maximum length one is chosen (without in the chain). To the maximum rescale chain, the maximum allowed rescale value ( in SEAL) in inserted at the beginning of the chain (it is called the special prime) because it is consumed during encryption. More rescale values are appended to this chain until is consumed. In other words, is factorized into such that is minimized and and is a poweroftwo. For each element in the appended chain, is applied to obtain a vector of bit sizes, which is then returned.
Rotation Keys Selection Pass:
Similar to rotation keys selection in CHET [17], the rotation keys selection pass in EVA maintains a set of rotation steps on each node; RotateLeft and RotateRight insert their step count to (RotateRight step count is normalized to RotateLeft step count) and the rest copy their step count from the set of their parents. After the traversal, the union of the sets for all the leaves in the graph is returned.
7 Frontends of EVA
The various transformations described so far for compiling an input EVA program into an executable EVA program make up the backend in the EVA compiler framework. In this section, we describe two frontends for EVA, that make it easy to write programs for EVA.
7.1 PyEVA
We have built a generalpurpose frontend for EVA as a DSL embedded into Python, called PyEVA. Consider the PyEVA program in Figure 1 for Sobel filtering, which is a form of edge detection in image processing. The class Program is a wrapper for the Protocol Buffer [22] format for EVA programs mentioned in Section 3. It includes a context manager, such that inside a with program: block all operations are recorded in program. For example, the inputEncrypted function inserts an input node of type into the program currently in context and additionally returns an instance of class Expr, which stores a reference to the input node. The expression additionally overrides Python operators to provide the simple syntax seen here.
7.2 EVA for Neural Network Inference
CHET [17] is a compiler for evaluating neural networks on encrypted inputs. The CHET compiler receives a neural network as a graph of highlevel tensor operations, and through its kernel implementations, analyzes and executes these neural networks against FHE libraries. CHET lacks a proper backend and operates more as an interpreter coupled with automatically chosen highlevel execution strategies.
We have obtained the CHET source code and modified it to use the EVA compiler as a backend. CHET uses an interface called Homomorphic Instruction Set Architecture (HISA) as a common abstraction for different FHE libraries. In order to make CHET generate EVA programs, we introduce a new HISA implementation that instead of calling homomorphic operations inserts instructions into an EVA program. This decouples the generation of the program from its execution. We make use of CHET’s data layout selection optimization, but not its encryption parameter selection functionality, as this is already provided in EVA. Thus, EVA subsumes CHET.
8 Experimental Evaluation
Network  No. of layers  # FP  Accu  

Conv  FC  Act  operations  racy(%)  
LeNet5small  2  2  4  159960  98.45 
LeNet5medium  2  2  4  5791168  99.11 
LeNet5large  2  2  4  21385674  99.30 
Industrial  5  2  6     
SqueezeNetCIFAR  10  0  9  37759754  79.38 
Model  Input Scale ()  Output  Accu  

Cipher  Vector  Scalar  Scale  racy(%)  
LeNet5small  25  15  10  30  98.45 
LeNet5medium  25  15  10  30  99.09 
LeNet5large  25  20  10  25  99.29 
Industrial  30  15  10  30   
SqueezeNetCIFAR  25  15  10  30  78.88 
Model  CHET  EVA  Speedup from EVA 

LeNet5small  3.7  0.6  6.2 
LeNet5medium  5.8  1.2  4.8 
LeNet5large  23.3  5.6  4.2 
Median  70.4  9.6  7.3 
SqueezeNetCIFAR  344.7  72.7  4.7 
In this section, we first describe our experimental setup (Section 8.1). We then describe our evaluation of homomorphic neural network inference (Section 8.2) and homomorphic arithmetic, statistical machine learning, and image processing applications (Section 8.3).
8.1 Experimental Setup
All experiments were conducted on a 4 socket machine with Intel Xeon Gold 5120 2.2GHz CPU with 56 cores (14 cores per socket) and 190GB memory. Our evaluation of all applications uses SEAL v3.3.1 [38], that implements the RNS variant of the CKKS scheme [12]. All experiments use the default 128bit security level. All results reported are an average over 20 different test inputs, unless otherwise specified.
We evaluate a simple arithmetic application to compute the path length in 3dimensional space. We also evaluate applications in statistical machine learning, image processing, and deep neural network (DNN) inferencing using the frontends that we built on top of EVA (Section 7). For DNN inferencing, we compare EVA with the stateoftheart compiler for homomorphic DNN inferencing, CHET [17], which has been shown to outperform handtuned codes. For the other applications, no suitable compiler exists for comparison. Handwritten codes also do no exist as it is very tedious to write them manually, like the homomorphic image processing applications. We evaluate these applications using EVA to show that EVA yields good performance with little programming effort.
8.2 Deep Neural Network (DNN) Inference
Networks:
We evaluate a set of deep neural network (DNN) architectures for image classification tasks that are summarized in Table 3:

The three LeNet5 networks are all for the MNIST [30]
dataset, which vary in the number of neurons. The largest one matches the one used in the TensorFlow’s tutorials
[39]. 
Industrial is a network from an industry partner for privacysensitive binary classification of images.
We obtain these networks (and the models) from the authors of CHET, so they match the networks evaluated in their paper [17]
. Industrial is a FHEcompatible neural network that is proprietary, so the authors gave us only the network structure without the trained model (weights) or the test datasets. We evaluate this network using randomly generated numbers (between 1 and 1) for the model and the images. All the other networks were made FHEcompatible by CHET authors using averagepooling and polynomial activations instead of maxpooling and ReLU activations. Table
3 lists the accuracies we observed for these networks using unencrypted inference on the test datasets. We evaluate encrypted image inference with a batch size of 1 (latency).Model  CHET  EVA  

LeNet5small  15  480  8  14  360  6 
LeNet5medium  15  480  8  14  360  6 
LeNet5large  15  740  13  15  480  8 
Industrial  16  1222  21  15  810  14 
SqueezeNetCIFAR  16  1740  29  16  1225  21 
Model  Time (s)  

Compilation  Context  Encrypt  Decrypt  
LeNet5small  0.14  1.21  0.03  0.01 
LeNet5medium  0.50  1.26  0.03  0.01 
LeNet5large  1.13  7.24  0.08  0.02 
Industrial  0.59  15.70  0.12  0.03 
SqueezeNetCIFAR  4.06  160.82  0.42  0.26 
Application  Vector Size  LoC  Time 

3dimensional Path Length  4096  45  0.394 
Linear Regression  2048  10  0.027 
Polynomial Regression  4096  15  0.104 
Multivariate Regression  2048  15  0.094 
Sobel Filter Detection  4096  35  0.511 
Harris Corner Detection  4096  40  1.004 
Scaling Factors:
The scaling factors, or scales in short, must be chosen by the user. For each network (and model), we use CHET’s profilingguided optimization on the first 20 test images to choose the input scales as well as the desired output scale. There is only one output but there are many inputs. For the inputs, we choose one scale each for , , and inputs. Both CHET and EVA use the same scales, as shown in Table 4. The scales impact both performance and accuracy. We evaluate EVA on all test images using these scales and report the accuracy of homomorphic inference in Table 4 (we do not evaluate CHET on all test images because it is much slower than EVA). When compared with the accuracy of unencrypted inference (Table 3), there is a negligible degradation. Higher values of scaling factors may improve the accuracy, but will also increase the latency of homomorphic inference.
Comparison with CHET Compiler:
Table 5 shows that EVA is at least faster than CHET on 56 threads for all networks. Note that the average latency of CHET is slower than that reported in their paper [17]. This could be due to differences in the experimental setup. The input and output scales they use are different, so is the SEAL version (3.1 vs. 3.3.1). We suspect the machine differences to be the primary reason for the slowdown because they use smaller number of heavier cores (16 3.2GHz cores vs. 56 2.2GHz cores). In any case, our comparison of CHET and EVA is fair because both use the same input and output scales, SEAL version, ChannelHeightWidth (CHW) data layout, and hardware. The differences between CHET and EVA are solely due to the benefits that accrue from EVA’s lowlevel optimizations. Thus, EVA is on average faster than CHET.
Strong Scaling:
To understand the performance differences between CHET and EVA, we evaluated them on 1, 7, 14, 28, and 56 threads. Figure 2 shows the strong scaling. We omit LeNet5small because it takes too little time, even on 1 thread. It is apparent that EVA scales much better than CHET. The parallelization in CHET is within a tensor operation or kernel using OpenMP. Such static, bulksynchronous schedule limits the available parallelism. In contrast, EVA dynamically schedules the directed acyclic graph of EVA (or SEAL) operations asynchronously. Thus, it exploits the parallelism available across tensor kernels, resulting in much better scaling. The average speedup of EVA on 56 threads over EVA on 1 thread is (excluding LeNet5small).
Encryption Parameters:
EVA is much faster than CHET, even on 1 thread (by on average). To understand this, we report the encryption parameters selected by CHET and EVA in Table 6. EVA selects much smaller coefficient modulus, both in terms of the number of elements in it and their product. Consequently, the polynomial modulus is one powerof2 lower in all networks, except LeNet5large. This reduction reduces the cost (and the memory) of each homomorphic operation (and ciphertext) significantly. CHET relies on an expertoptimized library of homomorphic tensor kernels. However, even experts cannot optimize across different kernels as that information is not available to them. Consequently, Rescale and ModSwitch used by these experts for a given tensor kernel may be suboptimal for the program. On the other hand, EVA performs global (interprocedural) analysis to minimize the length of the coefficient modulus, yielding much smaller encryption parameters.
Comparison with HandWritten LoLa:
LoLa [7] implements handtuned homomorphic inference for neural networks, but the networks they implement are different than the ones we evaluated (and the ones in CHET). Nonetheless, they implement networks for the MNIST and CIFAR10 datasets.
For the MNIST dataset, LoLa implements the highlytuned CryptoNets [20] network (which is similar in size to LeNet5small). This implementation has an average latency of seconds and has an accuracy of . EVA takes only seconds on a much larger network, LeNet5medium, with a better accuracy of . For the CIFAR10 dataset, LoLa implements a custom network that takes seconds and has an accuracy of . EVA takes only seconds on a much larger network with a better accuracy of .
LoLa uses SEAL 2.3 (which implements BFV [18]) which is less efficient than SEAL 3.3.1 (which implements RNSCKKS [12]) but much more easier to use. EVA is faster because it exploits a more efficient FHE scheme which is much more difficult to manually write code for. Thus, EVA outperforms even highly tuned expertwritten implementations like LoLa with very little programming effort.
Compilation Time:
We present the compilation time, encryption context time, encryption time, and decryption time for all networks in Table 7. The encryption context time includes the time to generate the public key, the secret key, the rotation keys, and the relinearization keys. This can take a lot of time, especially for large , like in SqueezeNetCIFAR. Compilation time, encryption time, and decryption time are negligible for all networks.
8.3 Arithmetic, Statistical Machine Learning, and Image Processing
We implemented several applications using PyEVA. To illustrate a simple arithmetic application, we implemented an application that computes the length of a given encrypted 3dimensional path. This computation can be used as a kernel in several applications like in secure fitness tracking on mobiles. For statistical machine learning, we implemented linear regression, polynomial regression, and multivariate regression on encrypted vectors. For image processing, we implemented Sobel filter detection and Harris corner detection on encrypted images. All these implementations took very few lines of code (), as shown in Table 8.
Table 8 shows the execution time of these applications on encrypted data using 1 thread. Sobel filter detection takes half a second and Harris corner detection takes only a second. The rest take negligible time. We believe Harris corner detection is one of the most complex programs that has been homomorphically evaluated. EVA enables writing advanced applications in various domains with little programming effort, while providing excellent performance.
9 Related Work
Compilers for FHE:
To reduce the burden of writing FHE programs, compilers have been proposed that target different FHE libraries. Some of these compilers support general purpose languages like Julia (cf. [2]), C++ (cf. [14]) and R (cf. [3]), but they are not amenable to incorporating domainspecific or targetspecific optimizations, like EVA. None of these compilers target the recent CKKS scheme [13, 12] (or SEAL library [38]) which is more complex to write or generate code for.
Some of the existing domainspecific compilers [17, 5, 4] target CKKS, but they rely on expertoptimized runtime of highlevel operations that hides the complexities of FHE operations. CHET [17] is a compiler for tensor programs that automates the selection of data layouts, and as we show, this can be used by a frontend of EVA. The nGraphHE [5] project introduced an extension to the Intel nGraph [16]deep learning compiler that allowed data scientists to make use of FHE with minimal code changes. The nGraphHE compiler uses runtime optimization (e.g., detection of special plaintext values) and compiletime optimizations (e.g. use of ISAlevel parallelism, graphlevel optimizations) to achieve a good performance. nGraphHE2 [4] is an extension of nGraphHE that uses a hybrid computational model – the server interacts with the client to perform nonHE compatible operations, which increases the communication overhead. Moreover, neither nGraphHE nor nGraphHE2 introduce automatic encryption parameter selection, like EVA. In any case, CHET, nGraphHE, and nGraphHE2 can target EVA instead of the FHE scheme directly to benefit from lowlevel optimizations.
No existing compiler automatically inserts Relinearize, Rescale, or ModSwitch operations. EVA not only inserts them but also minimizes the coefficient modulus chain length.
Compilers for MPC:
Multiparty computation (MPC) [21, 41] is another technique for privacypreserving computation. The existing MPC compilers are mostly generalpurpose [23] and even though it is possible to use them for deep learning applications, it is hard to program against a generalpurpose interface. The EzPC compiler is a machine learning compiler that combines arithmetic sharing and garbled circuits and operates in a twoparty setting [9]. EzPC uses ABY as a cryptographic backend [33].
PrivacyPreserving Deep Learning:
CryptoNets, one of the first systems for neural network inference using FHE [20] and the consequent work on LoLa, a lowlatency CryptoNets [7], show the ever more practical use of FHE for deep learning. CryptoNets and LoLa however use kernels for neural networks that directly translate the operations to the cryptographic primitives of the FHE schemes. There are also other algorithms and cryptosystems specifically for deep learning that rely on FHE (CryptoDL [24], [8], [27]), MPC (Chameleon [36], DeepSecure [37], SecureML [34]), oblivious protocols (MiniONN [31]), or on hybrid approaches (Gazelle [28], SecureNN [40].) None of these provide the flexibility and the optimizations of a compiler approach.
10 Conclusions
This paper introduces a new language and intermediate representation called Encrypted Vector Arithmetic (EVA) for generalpurpose FullyHomomorphic Encryption (FHE) computation. EVA includes a Python frontend that can be used to write advanced programs with little programming effort, and it hides all the cryptographic details from the programmer. EVA includes an optimizing compiler that generates correct, secure, and efficient code, targeting the stateoftheart SEAL library. EVA is also designed for easy targeting of domain specific languages. The stateoftheart neural network inference compiler CHET, when retargeted onto EVA, outperforms its unmodified version by on average. EVA provides a solid foundation for a richer variety of FHE applications and domainspecific FHE compilers.
References
 [1] Martin Albrecht, Melissa Chase, Hao Chen, Jintai Ding, Shafi Goldwasser, Sergey Gorbunov, Shai Halevi, Jeffrey Hoffstein, Kim Laine, Kristin Lauter, Satya Lokam, Daniele Micciancio, Dustin Moody, Travis Morrison, Amit Sahai, and Vinod Vaikuntanathan. Homomorphic encryption security standard. Technical report, HomomorphicEncryption.org, Toronto, Canada, November 2018.
 [2] David W. Archer, José Manuel Calderón Trilla, Jason Dagit, Alex Malozemoff, Yuriy Polyakov, Kurt Rohloff, and Gerard Ryan. Ramparts: A programmerfriendly system for building homomorphic encryption applications. In Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography, WAHC’19, pages 57–68, New York, NY, USA, 2019. ACM.
 [3] Louis JM Aslett, Pedro M Esperança, and Chris C Holmes. A review of homomorphic encryption and software tools for encrypted statistical machine learning. arXiv preprint arXiv:1508.06574, 2015.
 [4] Fabian Boemer, Anamaria Costache, Rosario Cammarota, and Casimir Wierzynski. nGraphHE2: A highthroughput framework for neural network inference on encrypted data. In Proceedings of the 7th ACM Workshop on Encrypted Computing & Applied Homomorphic Cryptography, 2019.
 [5] Fabian Boemer, Yixing Lao, Rosario Cammarota, and Casimir Wierzynski. nGraphHE: A graph compiler for deep learning on homomorphically encrypted data. In Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019.
 [6] Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (Leveled) fully homomorphic encryption without bootstrapping. In Shafi Goldwasser, editor, ITCS 2012: 3rd Innovations in Theoretical Computer Science, pages 309–325, Cambridge, MA, USA, January 8–10, 2012. Association for Computing Machinery.
 [7] Alon Brutzkus, Ran GiladBachrach, and Oren Elisha. Low latency privacy preserving inference. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML, 2019.
 [8] Hervé Chabanne, Amaury de Wargny, Jonathan Milgram, Constance Morel, and Emmanuel Prouff. Privacypreserving classification on deep neural network. Cryptology ePrint Archive, Report 2017/035, 2017. http://eprint.iacr.org/2017/035.
 [9] Nishanth Chandran, Divya Gupta, Aseem Rastogi, Rahul Sharma, and Shardul Tripathi. Ezpc: Programmable and efficient secure twoparty computation for machine learning. In IEEE European Symposium on Security and Privacy, EuroS&P, 2019.
 [10] Hao Chen. Optimizing relinearization in circuits for homomorphic encryption. CoRR, abs/1711.06319, 2017. https://arxiv.org/abs/1711.06319.
 [11] Jung Hee Cheon, Kyoohyung Han, Andrey Kim, Miran Kim, and Yongsoo Song. A full RNS variant of approximate homomorphic encryption. In Selected Areas in Cryptography – SAC 2018. Springer, 2018. LNCS 11349.
 [12] Jung Hee Cheon, Kyoohyung Han, Andrey Kim, Miran Kim, and Yongsoo Song. A full RNS variant of approximate homomorphic encryption. In Carlos Cid and Michael J. Jacobson Jr:, editors, SAC 2018: 25th Annual International Workshop on Selected Areas in Cryptography, volume 11349 of Lecture Notes in Computer Science, pages 347–368, Calgary, AB, Canada, August 15–17, 2019. Springer, Heidelberg, Germany.
 [13] Jung Hee Cheon, Andrey Kim, Miran Kim, and Yong Soo Song. Homomorphic encryption for arithmetic of approximate numbers. In Tsuyoshi Takagi and Thomas Peyrin, editors, Advances in Cryptology – ASIACRYPT 2017, Part I, volume 10624 of Lecture Notes in Computer Science, pages 409–437, Hong Kong, China, December 3–7, 2017. Springer, Heidelberg, Germany.
 [14] Cingulata. https://github.com/CEALIST/Cingulata, 2018.
 [15] David Corvoysier. Squeezenet for CIFAR10. https://github.com/kaizouman/tensorsandbox/tree/master/cifar10/models/squeeze, 2017.
 [16] Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, William Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar Vijay, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, and Tristan J. Webb. Intel ngraph: An intermediate representation, compiler, and executor for deep learning. CoRR, abs/1801.08058, 2018.
 [17] Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madanlal Musuvathi, and Todd Mytkowicz. Chet: An optimizing compiler for fullyhomomorphic neuralnetwork inferencing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019.
 [18] Junfeng Fan and Frederik Vercauteren. Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report 2012/144, 2012. https://eprint.iacr.org/2012/144.
 [19] Galois system, 2019.
 [20] Ran GiladBachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of The 33rd International Conference on Machine Learning, ICML, 2016.

[21]
Oded Goldreich, Silvio Micali, and Avi Wigderson.
How to play any mental game or A completeness theorem for protocols
with honest majority.
In Alfred Aho, editor,
19th Annual ACM Symposium on Theory of Computing
, pages 218–229, New York City, NY, USA, May 25–27, 1987. ACM Press.  [22] Protocol buffer. https://developers.google.com/protocolbuffers. Google Inc.
 [23] Marcella Hastings, Brett Hemenway, Daniel Noble, and Steve Zdancewic. SoK: General purpose compilers for secure multiparty computation. In 2019 IEEE Symposium on Security and Privacy, pages 1220–1237, San Francisco, CA, USA, May 19–23, 2019. IEEE Computer Society Press.
 [24] Ehsan Hesamifard, Hassan Takabi, and Mehdi Ghasemi. Cryptodl: Deep neural networks over encrypted data. 2017.
 [25] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. Squeezenet: Alexnetlevel accuracy with 50x fewer parameters and <1mb model size. CoRR, abs/1602.07360, 2016. https://arxiv.org/abs/1602.07360.
 [26] Cryptography Lab in Seoul National University. Homomorphic encryption for arithmetic of approximate numbers (heaan). https://github.com/snucrypto/HEAAN.
 [27] Xiaoqian Jiang, Miran Kim, Kristin E. Lauter, and Yongsoo Song. Secure outsourced matrix computation and application to neural networks. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors, ACM CCS 2018: 25th Conference on Computer and Communications Security, pages 1209–1222, Toronto, ON, Canada, October 15–19, 2018. ACM Press.
 [28] Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. GAZELLE: A low latency framework for secure neural network inference. In William Enck and Adrienne Porter Felt, editors, USENIX Security 2018: 27th USENIX Security Symposium, pages 1651–1669, Baltimore, MD, USA, August 15–17, 2018. USENIX Association.
 [29] Alex Krizhevsky. The CIFAR10 dataset. https://www.cs.toronto.edu/~kriz/cifar.html, 2009.

[30]
Yann LeCun, Corinna Cortes, and Christopher J.C. Burges.
The MNIST database of handwritten digits.
http://yann.lecun.com/exdb/mnist/.  [31] Jian Liu, Mika Juuti, Yao Lu, and N. Asokan. Oblivious neural network predictions via MiniONN transformations. In Bhavani M. Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu, editors, ACM CCS 2017: 24th Conference on Computer and Communications Security, pages 619–631, Dallas, TX, USA, October 31 – November 2, 2017. ACM Press.
 [32] Vadim Lyubashevsky, Chris Peikert, and Oded Regev. On ideal lattices and learning with errors over rings. In Henri Gilbert, editor, Advances in Cryptology – EUROCRYPT 2010, volume 6110 of Lecture Notes in Computer Science, pages 1–23, French Riviera, May 30 – June 3, 2010. Springer, Heidelberg, Germany.
 [33] Payman Mohassel and Peter Rindal. ABY: A mixed protocol framework for machine learning. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors, ACM CCS 2018: 25th Conference on Computer and Communications Security, pages 35–52, Toronto, ON, Canada, October 15–19, 2018. ACM Press.
 [34] Payman Mohassel and Yupeng Zhang. SecureML: A system for scalable privacypreserving machine learning. In 2017 IEEE Symposium on Security and Privacy, pages 19–38, San Jose, CA, USA, May 22–26, 2017. IEEE Computer Society Press.
 [35] Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the TwentyFourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 456–471, New York, NY, USA, 2013. ACM.
 [36] M. Sadegh Riazi, Christian Weinert, Oleksandr Tkachenko, Ebrahim M. Songhori, Thomas Schneider, and Farinaz Koushanfar. Chameleon: A hybrid secure computation framework for machine learning applications. In Jong Kim, GailJoon Ahn, Seungjoo Kim, Yongdae Kim, Javier López, and Taesoo Kim, editors, ASIACCS 18: 13th ACM Symposium on Information, Computer and Communications Security, pages 707–721, Incheon, Republic of Korea, April 2–6, 2018. ACM Press.
 [37] Bita Darvish Rouhani, M. Sadegh Riazi, and Farinaz Koushanfar. Deepsecure: Scalable provablysecure deep learning. In Proceedings of the 55th Annual Design Automation Conference, DAC ’18, pages 2:1–2:6, New York, NY, USA, 2018. ACM.
 [38] Microsoft SEAL (release 3.3). https://github.com/Microsoft/SEAL, June 2019. Microsoft Research, Redmond, WA.
 [39] LeNet5like convolutional MNIST model example. https://github.com/tensorflow/models/blob/v1.9.0/tutorials/image/mnist/convolutional.py, 2016.
 [40] Sameer Wagh, Divya Gupta, and Nishanth Chandran. SecureNN: 3party secure computation for neural network training. Proceedings on Privacy Enhancing Technologies, 2019(3):26–49, July 2019.
 [41] Andrew ChiChih Yao. How to generate and exchange secrets (extended abstract). In 27th Annual Symposium on Foundations of Computer Science, pages 162–167, Toronto, Ontario, Canada, October 27–29, 1986. IEEE Computer Society Press.