1. Introduction
Cryptography is an integral part of many security protocols, which in turn are used by numerous applications. However, despite the strong theoretical guarantee, cryptosystems in practice are vulnerable to sidechannel attacks when nonfunctional properties such as timing, power and electromagnetic radiation are exploited to gain information about sensitive data (Kocher, 1996; Chari et al., 1999; Messerges et al., 1999; Clavier et al., 2000; Quisquater and Samyde, 2001; Messerges et al., 2002; Brier et al., 2004; Zhou and Feng, 2005; Standaert et al., 2009; Hund et al., 2013). For example, if the power consumption of a device running an encryption algorithm depends on the secret key, statistical techniques such as differential power analysis (DPA) can be used to perform attacks reliably (Kocher et al., 1999; Clavier et al., 2000; Brier et al., 2004; Messerges, 2000; Moradi, 2014).
Although there are techniques for mitigating power side channels (Zhang et al., 2018; Eldib and Wang, 2014; Eldib et al., 2014; Bhunia et al., 2014; Almeida et al., 2013; Bayrak et al., 2013; Akkar and Goubin, 2003), they focus exclusively on the Boolean level, e.g., by targeting circuits in cryptographic hardware or software code that has been converted to bitlevel representations (Ishai et al., 2003). This limits the use of such techniques in real compilers; as a result, none of them was able to fit into modern compilers such as GCC and LLVM to directly handle the wordlevel intermediate representation (IR). In addition, code transformations in compilers may add new side channels, even if the input program is equipped with stateoftheart countermeasures.
Specifically, compilers tend to use a limited number of the CPU’s registers to store a potentiallylarge number of intermediate computation results of a program. And, when two masked and hence desensitized values are put into the same register, it is possible for the masking countermeasure to be removed accidentally. We will show, as part of this work, that even provablysecure techniques such as highorder masking (Barthe et al., 2015, 2016; Balasch et al., 2014) is vulnerable to such leaks. Indeed, we have found leaks in the compiled code produced by LLVM for both x86 and MIPS/ARM platforms, regardless of whether the input program is equipped with highorder masking.
To solve the problem, we propose a secure compilation method with two main contributions. First, we introduce a typeinference system to soundly and quickly detect power sidechannel leaks. By soundly, we mean that the system is conservative and guarantees not to miss real leaks. By quickly, we mean that it relies only on syntactic information of the program and thus can be ordersofmagnitude faster than formal verification (Eldib et al., 2014; Zhang et al., 2018). Second, we propose a mitigation technique for the compiler’s backend modules to ensure that, for each pair of intermediate variables that may cause sidechannel leaks, the two values are always stored in different registers or memory locations.
Figure 1 shows an overview of our method, which takes a program as input and returns the mitigated code as output. It has two major steps. First, sound type inference is used to detect leaks by assigning each variable a distribution type. User only provides an initial annotation of input variables, i.e., public (e.g., plaintext), secret (e.g., key), or random (e.g., mask), while the types of other variables are inferred automatically. Based on the inferred types, we check for each pair of variables to see whether the values may be stored in the same register and cause leaks. If the pair is found to be leaky, we constrain the compiler’s backend register allocation modules to ensure that and are assigned to different registers or spilled to memory.
Our method differs from existing approaches in several aspects. First, it specifically targets power sidechannel leaks caused by reuse of CPU registers in compilers, which have been largely overlooked by prior work. Second, it leverages Datalog, together with a number of domainspecific optimizations, to achieve high efficiency and accuracy during leak detection, where type inference rules are designed specifically to capture register reuse related leaks. Third, mitigation in the backend is systematic and leverages the existing productionquality modules in LLVM to ensure that the compiled code is secure by construction.
Unlike existing techniques that require a priori translation of the input program to a Boolean representation, our method works directly on the wordlevel IR and thus fits naturally into modern compilers. For each program variable, the amount of leak is quantified using the wellknown Hamming Weight (HW) and Hamming Distance (HD) leakage models (Mangard et al., 2007; Mangard, 2002). Correlation between these models and leaks on real devices has been confirmed in prior works (see Section 2). We shall also show, via experiments, that leaks targeted by our method exist even in program equipped with highorder masking (Barthe et al., 2015, 2016; Balasch et al., 2014).
To detect leaks quickly, we rely on type inference, which models the input program using a set of Datalog facts and codifies the type inference algorithm in a set of Datalog rules. Then, an offtheshelf Datalog solver is used to deduce new facts. Here, a domainspecific optimization, for example, is to leverage the compiler’s backend modules to extract a map from variables to registers and utilize the map to reduce the computational overhead, e.g., by checking pairs of some (instead of all) variables for leaks.
Our mitigation in the compiler’s backend is systematic: it ensures that all leaks detected by type inference are eliminated. This is accomplished by constraining register allocation modules and then propagating the effect to subsequent modules, without having to implement any new backend module from scratch. Our mitigation is also efficient in that we add a number of optimizations to ensure that the mitigated code is compact and has low runtime overhead. While our implementation focuses on x86, the technique itself is general enough that it may be applied to other instruction set architectures (ISAs) such as ARM and MIPS as well.
We have evaluated our method on a set of cryptographic software benchmarks (Barthe et al., 2015; Bayrak et al., 2013), including implementations of wellknown ciphers such as AES and MACKeccak. These benchmark programs are all protected by masking countermeasures but, still, we detected register reuse related leaks in the LLVM compiled code. The code produced by our mitigation, also based on LLVM, is always leak free. In terms of performance, our method significantly outperformed competing approaches such as highorder masking in that our mitigated code not only is more compact and secure, but also runs significantly faster than code mitigated by highorder masking techniques (Barthe et al., 2015, 2016).
To summarize, we make the following contributions:

We show that register reuse introduces sidechannel leaks even in software already protected by masking.

We propose a Datalog based type inference system to soundly and quickly detect these sidechannel leaks.

We propose a mitigation technique for the compiler’s backend modules to systematically remove the leaks.

We implement the method in LLVM and show its effectiveness on a set of cryptographic software.
The remainder of the paper is organized as follows. First, we illustrate the problem and the technical challenges associated with solving it in Section 2. Then, we review the background including the threat model and leakage model in Section 3. Next, we present our method for leak detection in Section 4 and leak mitigation in Section 5, followed by domainspecific optimizations in Section 6. We present our experimental results in Section 7, review the related work in Section 8, and give our conclusions in Section 9.
2. Motivation
We use examples to illustrate why register reuse may lead to sidechannel leaks and the challenges for removing them.
2.1. The HW and HD Leaks
Consider the program Xor() in Figure 2, which takes the public txt and the secret key as input and returns the exclusiveor of them as output. Since logical 1 and 0 bits in a CMOS circuit correspond to different leakage currents, they affect the power consumption of the device (Mangard, 2002); such leaks were confirmed by prior works (Moradi, 2014; Brier et al., 2004) and summarized in the Hamming Weight (HW) model. In program Xor(), variable t has a power sidechannel leak because its register value depends on the secret key.

The leak may be mitigated by masking (Goubin, 2001; Akkar and Goubin, 2003) as shown in program SecXor(). The idea is to split a secret to randomized shares before using them; unless the attacker has all shares, it is theoretically impossible to deduce the secret. In firstorder masking, the secret key may be split to {mask1,mk} where mask1
is a random variable,
mk=mask1key is the bitwise ExclusiveOR of mask1 and key, and thus mask1mk=key. We say that mk is masked and thus leak free because it is statistically independent of the value of the key: if mask1 has a uniform random distribution then so is mk. Therefore, when mk is aggregated over time, as in sidechannel attacks, the result reveals no information of key.Unfortunately, there can be leaks in SecXor() when the variables share a register and thus create secondorder correlation. For example, the x86 assembly code of mkmask1key is MOV mask1 %edx; XOR key %edx, meaning the values stored in %edx are mask1 and mask1key, respectively. Since bitflips in the register also affect the leakage current, they lead to sidechannel leaks. This is captured by the Hamming Distance (HD) power model (Brier et al., 2004): HD(mask1,mask1key) HW(mask1 (mask1 key)) HW(key), which reveals key. Consider, for example, where key is and mask1 is in binary. If a register stores mask1 (=) first and updates its value as mask1key (=), the transition of the register (bitflip) is , which is same as the key value.
In embedded systems, specialized hardware (Rührmair et al., 2010; Maiti and Schaumont, 2011; Anagnostopoulos et al., 2018) such as physically unclonable function (PUF) and true random number generator (TRNG) may produce key and mask1 and map them to the memory address space; thus, these variables are considered leak free. Specialized hardware may also directly produce the masked shares {mask1,mk} without producing the unmasked key in the first place. This more secure approach is shown in program SecXor2(), where masked shares are used to compute the result (txtkey), which is also masked, but by mask2 instead of mask1.
Inside SecXor2(), care has been given to randomize the intermediate results by mask2 first, before derandomize them by mask1. Thus, the CPU’s registers never hold any unmasked result. However, there can still be HD leaks, for example, when the same register holds the following pairs at consecutive time steps: (mask1,mk), (mask1,t1), and (mask2,t3).
2.2. Identifying the HD Leaks
To identify these leaks, we need to develop a scalable method. While there are techniques for detecting flaws in various masking implementations (Coron et al., 2013; Hou et al., 2017; Barthe et al., 2016, 2017; Duc et al., 2015; Bloem et al., 2018; Goubin, 2001; Blömer et al., 2004; Schramm and Paar, 2006; Canright and Batina, 2008; Rivain and Prouff, 2010; Prouff and Rivain, 2013; Eldib and Wang, 2014; Reparaz et al., 2015), none of them were scalable enough for use in real compilers, and none of them targeted the HD leaks caused by register reuse. Our work bridges the gap.
First, we check if there are sensitive, unmasked values stored in a CPU’s register. Here, mask means that a value is made statistically independent of the secret using randomization. We say that a value is HWsensitive if, statistically, it still depends on the secret. For example, in Figure 2, key is HWsensitive whereas mk=mask1key has been masked. If there were nk=mask1key, it would be called HWsensitive because the masking is not perfect.
Second, we check if there is any pair of values that, when stored in the same register, may cause an HD leak. That is, may statistically depend on the secret. For example, in Figure 2, mk and mask1 form a HDsensitive pair.
Formal Verification
In general, deciding whether a variable is HWsensitive, or a pair of variables is HDsensitive, is NPhard, since it corresponds to model counting (Zhang et al., 2018; Eldib et al., 2014). This is illustrated by Table 1, which shows the truth table of Boolean functions t1, t2 and t3 in terms of secret variable k and random variables m1, m2 and m3. First, there is no HW leak because, regardless of whether k=0 or 1, there is a 50% chance of t1 and t2 being 1 and a 25% chance of t3 being 1. This can be confirmed by counting the number of 1’s in the top and bottom halves of the table.
When two values are stored in the same register, however, the bitflip may depend on the secret. As shown in the column of the table, when , the bit is never flipped; whereas when , the bit is always flipped. The existence of HD leak for can be decided by model counting over the function : the number of solutions is 0/8 for but 8/8 for . In contrast, there is no HD leak for because the number of satisfying assignments (solutions) is always 2/8 regardless of whether or .
Type Inference
Since model counting is expensive, e.g., taking hours or longer even for small programs, it is not suitable for a compiler. Thus, we develop a fast, sound, and static type inference system to identify the HDsensitive pairs in a program. By fast, we mean that our method relies on syntactic information of the program or the platform (e.g., mapping from variables to physical registers). By sound, we mean that our method is conservative: it may introduce false alarms, and thus may mitigate unnecessarily, but it never misses real leaks.
Specifically, we assign each variable one of three types: , or (details in Section 3). Briefly,
means random uniform distribution,
means secret independent distribution, and means unknown distribution. Therefore, a variable may have a leak only if it is the type.k  m1  m2  m3  t1=  t2=  t3=  HD(t1,t2)  HD(t2,t3) 
m1m2  t1k  t2m3  =t1t2  =t2t3  
0  0  0  0  0  0  0  0  0 
0  0  0  1  0  0  0  0  0 
0  0  1  0  1  1  0  0  1 
0  0  1  1  1  1  1  0  0 
0  1  0  0  1  1  0  0  1 
0  1  0  1  1  1  1  0  0 
0  1  1  0  0  0  0  0  0 
0  1  1  1  0  0  0  0  0 
1  0  0  0  0  1  0  1  1 
1  0  0  1  0  1  1  1  0 
1  0  1  0  1  0  0  1  0 
1  0  1  1  1  0  0  1  0 
1  1  0  0  1  0  0  1  0 
1  1  0  1  1  0  0  1  0 
1  1  1  0  0  1  0  1  1 
1  1  1  1  0  1  1  1  0 
* 
* Our Datalog based type inference rules can infer it as instead of
In Table 1, for example, given , where and are random (), it is easy to see that is also random (). For , where , are , however, may not always be random, but we can still prove that is ; that is, is statistically independent of . This type of syntactical inference is fast because it does not rely on any semantic information, although in general, it is not as accurate as the model counting based approach. Nevertheless, such inaccuracy does not affect the soundness of our mitigation.
Furthermore, we rely on a Datalog based declarative analysis framework (Whaley et al., 2005; Zhang et al., 2014; Lam et al., 2005; Whaley and Lam, 2004; Bravenboer and Smaragdakis, 2009) to implement and refine the type inference rules, which can infer as instead of . We also leverage domainspecific optimizations, such as precomputing certain Datalog facts and using compiler’s backend information, to reduce cost and improve accuracy.
2.3. Mitigating the HD Leaks
To remove the leaks, we constrain the register allocation algorithm using our inferred types. We focus on LLVM and x86, but the method is applicable to MIPS and ARM as well. To confirm this, we inspected the assembly code produced by LLVM for the example (t1,t2,t3) in Table 1 and found HD leaks on all three architectures. For x86, in particular, the assembly code is shown in Figure 2(a), which uses %eax to store all intermediate variables and thus has a leak in HD(t1,t2).
Figure 2(b) shows our mitigated code, where the HDsensitive variables t1 and t2 are stored in different registers. Here, t1 resides in %eax and memory 20(%rbp) whereas t2 resides in %ecx and memory 16(%rbp). The stack and a value of %eax are shown in Figure 2(c), both before and after mitigation, when the leak may occur at lines 89. Since the value of k is used only once in the example, i.e., for computing t2, overwriting its value stored in the original memory location 16(%rbp) does not affect subsequent execution. If k were to be used later, our method would have made a copy in memory and direct uses of k to that memory location.
Register allocation in real compilers is a highly optimized process. Thus, care must be given to maintain correctness and performance. For example, the naive approach of assigning all HDsensitive variables to different registers does not work because the number of registers is small (x86 has 4 generalpurpose registers while MIPS has 24) while the number of sensitive variables is often large, meaning many variables must be spilled to memory.
The instruction set architecture also add constraints. In x86, for example, %eax is related to %ah and %al and thus cannot be assigned independently. Furthermore, binary operations such as Xor may require that the result and one operand must share the same register or memory location. Therefore, for mk=mask1key, it means that either mk and mask1 share a register, which causes a leak in HD(mk, mask1)=HW(key), or mk and key share a register, which causes a leak in HW(key) itself. Thus, while modifying the backend, multiple submodules must be constrained together to ensure the desired register and memory isolation (see Section 5).
2.4. Leaks in Highorder Masking
Here, a question is whether the HD leak can be handled by secondorder masking (which involves two variables). The answer is no, because even with highorder masking techniques such as Barthe et al. (Barthe et al., 2015, 2016, 2017), the compiled code may still have HD leaks introduced by register reuse. We confirmed this through experiments, where the code compiled by LLVM for highorder masked programs from Barthe et al. (Barthe et al., 2015) was found to contain HD leaks.
Figure 4 illustrates this problem on a secondorder arithmetic masking of the multiplication of txt (public) and key (secret) in a finite field. Here, the symbol denotes multiplication. While there are a lot of details, at a high level, the program relies on the same idea of secret sharing: random variables are used to split the secret key to three shares, before these shares participate in the computation. The result is a masked triplet (res0,res1,res2) such that (res0res1res2)keytxt.
The x86 assembly code in Figure 4 has leaks because the same register %edx stores both mask0 mask1 and mask0 mask1 key. Let the two values be denoted %edx and %edx, we have HD(%edx,%edx) = HW(key). Similar leaks exist in the LLVMgenerated assembly code of this program for ARM and MIPS as well, but we omit them for brevity.
3. Preliminaries
We define the threat model and then review the leakage models used for quantifying the power side channel.
3.1. The Threat Model
We assume the attacker has access to the software code, but not the secret data, and the attacker’s goal is to gain information of the secret data. The attacker may measure the power consumption of a device that executes the software, at the granularity of each machine instruction. A set of measurement traces is aggregated to perform statistical analysis, e.g., as in DPA attacks. In mitigation, our goal is to eliminate the statistical dependence between secret data and the (aggregated) measurement data.
Let be the program under attack and the triplet be the input: sets , and consist of public, secret, and random (mask) variables, respectively. Let , , , and be valuations of these input variables. Then, denotes, at time step , the power consumption of a device executing under input , and . Similarly, denotes the power consumption of the device executing under input , and . Between steps and , one instruction in is executed.
We say has a leak if there are , , and such that the distribution of differs from that of . Let random variables in be uniformly distributed in the domain
, and let the probability of each
be , we expect(1) 
For efficiency reasons, in this work, we identify sufficient conditions under which Formula 1 is implied. Toward this end, we focus on the leaks of individual variables, and pairs of variables, in instead of the sum : if we remove all individual leaks, the leakfree property over the sum is implied.
3.2. The Leakage Model
In the Hamming Weight (HW) model (Mangard et al., 2007; Mangard, 2002), the leakage associated with a register value, which corresponds to an intermediate variable in the program, depends on the number of 1bits. Let the value be where is the least significant bit, is the most significant bit, and each bit , where , is either 0 or 1. The Hamming Weight of is .
In the Hamming Distance (HD) model (Mangard et al., 2007; Mangard, 2002), the leakage depends not only on the current register value but also a reference value . Let . We define the Hamming Distance between and as , which is equal to , the Hamming Weight of the bitwise XOR of and . Another interpretation is to regard as a special case of , where all bits in the reference value are set to 0.
The widely used HW/HD models have been confirmed on various devices (Kocher et al., 1999; Clavier et al., 2000; Brier et al., 2004; Messerges, 2000; Moradi, 2014)
. The correlation between power variance and number of 1bits may be explained using the
leakage current of a CMOS transistor, which is the foundation of modern computing devices. Broadly speaking, a CMOS transistor has two kinds of leakage currents: static and dynamic. Static leakage current exists all the time but the volume depends on whether the transistor is on or off, i.e., a logical 1. Dynamic leakage current occurs only when a transistor is switched (01 or 10 flip). While static leakage current is captured by the HW model, dynamic leakage current is captured by the HD model (for details refer to Mangard (Mangard, 2002).)3.3. The Data Dependency
We consider two dependency relations: syntactical and statistical. Syntactical dependency is defined over the program structure: a function syntactically depends on the variable , denoted , if appears in the expression of ; that is, is in the support of , denoted .
Statistical dependency is concerned with scenarios where random variables are involved. For example, when , the probability of being logical 1 (always 50%) is not dependent on . However, when , where is a random uniform distribution in , the probability of being logical 1 is 100% when is 1, but 50% when is 0. In the latter case, we say that is statistically dependent on , denoted .
The relative strengths of the dependency relations are as follows: , i.e., if is syntactically independent of , it is statistically independent of . In this work, we rely on to infer during type inference, since the detection of HD leaks must be both fast and sound.
4. Typebased Static Leak Detection
We use a type system that starts from the input annotation (, and ) and computes a distribution type for all variables. The type indicates whether a variable may statistically depend on the secret input.
4.1. The Type Hierarchy
The distribution type of variable , denoted , may be one of the following kinds:

, which stands for random uniform distribution, means is either a random input or perfectly randomized (Blömer et al., 2004) by , e.g., .

, which stands for secret independent distribution, means that, while not , is statistically independent of the secret variable in .

, which stands for unknown distribution, indicates that we are not able to prove that is or and thus have to assume that may have a leak.
The three types form a hierarchy: is the least desired because it means that a leak may exist. is better: although it may not be , we can still prove that it is statistically independent of the secret, i.e., no leak. is the most desired because the variable not only is statistically independent of the secret (same as in ), but also can be used like a random input, e.g., to mask other () variables. For leak mitigation purposes, it is always sound to treat an variable as , or an variable as , although it may force instructions to be unnecessarily mitigated.
In practice, we want to infer as many and variables as possible. For example, if , and , then and . If and , then because, although may have any distribution, since is , is statistically independent of the secret.
We prefer over , when both are applicable to a variable , because if is XORed with a variable , we can easily prove that is using local inference, as long as is and is not randomized by the same input variable. However, if is labeled not as but as , local inference rules may not be powerful enough to prove that is or even ; as a result, we have to treat as (leak), which is less accurate.
4.2. Datalog based Analysis
In the remainder of this section, we present type inference for individual variables first, and then for HDsensitive pairs.
We use Datalog to implement the type inference. Here, program information is captured by a set of relations called the facts, which include the annotation of input in (), () and (). The inference algorithm is codified in a set of relations called the rules, which are steps for deducing types. For example, when and is , is also regardless of the actual expression that defines , as long as . This can be expressed as an inference rule.
After generating both the facts and the rules, we combine them to form a Datalog program, and solve it using an offtheshelf Datalog engine. Inside the engine, the rules are applied to the facts to generate new facts (types); the iterative procedure continues until the set of facts reaches a fixed point.
Since our type inference is performed on the LLVM IR, there are only a few instruction types to consider. For ease of presentation, we assume that a variable is defined by either a unary operator or a binary operator (ary operator may be handled similarly).

, where is a unary operator such as the Boolean (or bitwise) negation.

, where is a binary operator such as Boolean (or bitwise) , , and (finitefield multiplication).
For , we have , meaning and have the same type. For , the type depends on (1) if is , (2) if and are or , and (3) the sets of input variables upon which and depend.
4.3. Basic Type Inference Rules
Prior to defining the rules for , we define two related functions, and , in addition to , which is the set of input variables upon which depends syntactically.
Definition 4.1 ().
is a function that returns, for each variable , a subset of mask variables defined as follows: if , ; but if , ;

if , ; and

if , .
Given the dataflow graph of all instructions involved in computing and an input variable , there must exist a unique path from to in the graph. If there are more paths (or no path), would not have appeared in .
Definition 4.2 ().
is a function that returns, for each variable , a subset of mask variables defined as follows: if , , but if , then ;

if , ; and

if , where operator , then ; else .
Given the dataflow graph of all instructions involved in computing and an input variable , there must exist a unique path from to , along which all binary operators are ; if there are more such paths (or no such path), would not have appeared in .
Following the definitions of , and , it is straightforward to arrive at the basic inference rules (Ouahma et al., 2017; Barthe et al., 2016; Zhang et al., 2018):
Here, says if , where is a random input and is not masked by , then has random uniform distribution. This is due to the property of XOR. says if is syntactically independent of variables in , it has a secret independent distribution, provided that it is not .
4.4. Inference Rules to Improve Accuracy
With the two basic rules only, any variable not assigned or will be treated as , which is too conservative. For example, where , and , is actually . This is because is random and the other component, , is secret independent. Unfortunately, the two basic rules cannot infer that is . The following rules are added to solve this problem.
These rules mean that, for any , if one operand is , the other operand is , and they share no input, then has a secret independent distribution (). GMul denotes multiplication in a finite field. Here, is need; otherwise, the common input may cause problem. For example, if and , then has a leak because if , ; but if , .
Similarly, may elevate a variable from to , e.g., as in where and are both . Again, the condition in is needed because, otherwise, there may be cases such as , which is equivalent to and thus has a leak.
Figure 5 shows the other inference rules used in our system. Since these rules are selfexplanatory, we omit the proofs.
4.5. Detecting HDsensitive Pairs
Based on the variable types, we compute HDsensitive pairs. For each pair , we check if results in a leak when and share a register. There are two scenarios:

, meaning and are defined in two instructions.

, where the result and one operand are stored in the same register.
In the twoinstruction case, we check using Xorrelated inference rules. For example, if and , since appears in the supports of both expressions, is . Such leak will be denoted , where stands for “Double”.
In the singleinstruction case, we check based on the operator type. When , we have ; when , we have ; when (Xor), we have ; and when (GMul), the result of is if and is otherwise. Since the type inference procedure is agnostic to the result of , the type of depends on the types of and ; that is, . If there is a leak, it will be denoted .
The reason why HD leaks are divided to SEN_HD and SEN_HD is because they have to be mitigated differently. When the leak involves two instructions, it may be mitigated by constraining the register allocation algorithm such that and no longer can share a register. In contrast, when the leak involves a single instruction, it cannot be mitigated in this manner because in x86, for example, all binary instructions require the result to share the same register or memory location with one of the operands. Thus, mitigating the SEN_HD requires that we rewrite the instruction itself.
We also define a relation , meaning and indeed may share a register, and use it to filter the HDsensitive pairs, as shown in the two rules below.
Backend information (Section 6.1) is required to define the relation; for now, assume .
5. Mitigation during Code Generation
We mitigate leaks by using the two types of HDsensitive pairs as constraints during register allocation.
Register Allocation
The classic approach, especially for static compilation, is based on graph coloring (Chaitin, 2004; George and Appel, 1996), whereas dynamic compilation may use faster algorithms such as lossy graph coloring (Cooper and Dasgupta, 2006) or linear scan (Poletto and Sarkar, 1999). We apply mitigation on both graph coloring and LLVM’s basic register allocation algorithms. For ease of comprehension, we use graph coloring to illustrate our constraints.
In graph coloring, each variable corresponds to a node and each edge corresponds to an interference between two variables, i.e., they may be in use at the same time and thus cannot occupy the same register. Assigning variables to registers is similar to coloring the graph with colors. To be efficient, variables may be grouped to clusters, or virtual registers, before they are assigned to physical registers (colors). In this case, each virtual register (vreg), as opposed to each variable, corresponds to a node in the graph, and multiple virtual registers may be mapped to one physical register.
5.1. Handling Sen_hd Pairs
For each , where and are defined in two instructions, we add the following constraints. First, and are not to be mapped to the same virtual register. Second, virtual registers and (for and ) are not to be mapped to the same physical register. Toward this end, we constrain the behavior of two backend modules: Register Coalescer and Register Allocator.
Our constraint on Register Coalescer states that and , which correspond to and , must never coalesce, although each of them may still coalesce with other virtual registers. As for Register Allocator, our constraint is on the formulation of the graph. For each HDsensitive pair, we add a new interference edge to indicate that and must be assigned different colors.
During graph coloring, these new edges are treated the same as all other edges. Therefore, our constraints are added to the register allocator and its impact is propagated automatically to all subsequent modules, regardless of the architecture (x86, MIPS or ARM). When variables cannot fit in the registers, some will be spilled to memory, and all reference to them will be directed to memory. Due to the constraints we added, there may be more spilled variables, but spilling is handled transparently by the existing algorithms in LLVM. This is an advantage of our approach: it identifies a way to constrain the behavior of existing modules in LLVM, without the need to rewriting any of them from scratch.
5.2. Handling Sen_hd Pairs
For each pair, where and appear in the same instruction, we additionally constrain the DAG Combiner module to rewrite the instruction before constraining the register allocation modules. To see why, consider , which compiles to
MOVL 4(\%rbp), \%ecx // 4(\%rbp) = m (random)
XORL 8(\%rbp), \%ecx // 8(\%rbp) = k (secret)
Here, 4(%rbp) and 8(%rbp) are memory locations for and , respectively. Although and are (no leak) when stored in %ecx, the transition from to , , has a leak.
To remove the leak, we must rewrite the instruction:
MOVL 4(\%rbp), \%ecx // 4(\%rbp)= m
XORL \%ecx, 8(\%rbp) // 8(\%rbp)= k, and then mk
While still resides in %ecx, both and reside in the memory 8(%rbp). There is no leak because %ecx only stores () and . Furthermore, the solution is efficient in that no additional memory is needed. If were to be used subsequently, we would copy to another memory location and redirected uses of to that location.
Example
Figure 6 shows a real program (Yao et al., 2018), where is an array storing sensitive data while  are random masks. The compiled code (left) has leaks, whereas the mitigated code (right) is leak free. The reason why the original code (left) has leaks is because, prior to Line 8, %eax stores , whereas after Line 8, %eax stores ; thus, bitflips in %eax is reflected in , which is the sensitive data.
During register allocation, a virtual register would correspond to while would correspond to . Due to a constraint from this SEN_HD pair, our method would prevent and from coalescing, or sharing a physical register. After rewriting, shares the same memory location as while remains unchanged. Thus, is stored in %al and is spilled to memory, which removes the leak.
6. Domainspecific Optimizations
While the method presented so far has all the functionality, it can be made faster by domainspecific optimizations.
6.1. Leveraging the Backend Information
To detect HD leaks that likely occur, we focus on pairs of variables that may share a register as opposed to arbitrary pairs of variables. For example, if the live ranges of two variables overlap, they will never share a register, and we should not check them for HD leaks. Such information is readily available in the compiler’s backend modules, e.g., in graph coloring based register allocation, variables associated with any interference edge cannot share a register.
Thus, we define , meaning and may share a register. After inferring the variable types as , , or , we use to filter the variable pairs subjected to checking for SEN_HD and SEN_HD leaks (see Section 4.5). We will show in experiments that such backend information allows us to dramatically shrink the number of HDsensitive pairs.
6.2. Precomputing Datalog Facts
By default, only input annotation and basic dataflow (defuse) are encoded as Datalog facts, whereas the rest has to be deduced by inference rules. However, Datalog is not the most efficient way of computing sets, such as , and , or performing set operations such as .
In contrast, it is linear time (Ouahma et al., 2017) to compute sets such as , and explicitly. Thus, we choose to precompute them in advance and encode the results as Datalog facts. In this case, precomputation results are used to jump start Datalog based type inference. We will show, through experiments, that the optimization can lead to faster type inference than the default implementation.
6.3. Efficient Encoding of Datalog Relations
There are different encoding schemes for Datalog. For example, if and and . One way is to encode the sets is using a relation , where are variables and are inputs:
While the size of
Comments
There are no comments yet.