0.1 Introduction
Secure compilation is an emerging research field that puts together techniques from security, programming languages, formal verification, and hardware architectures to devise compilation chains that protect various aspects of software and eliminate security vulnerabilities [sigplanblog, busi2019brief]. As any other compiler, a secure one not only translates a source program, written in a highlevel language, into an efficient object code (lowlevel), but also provides mitigations that make exploiting security vulnerabilities more difficult and that limit the damage of an attack. Moreover, a secure compilation chain deploys mechanisms to enforce secure interoperability between code written in safe and unsafe languages, and makes it hard extracting confidential data from information gained by examining program runs (e.g., soft information as specific outputs given certain inputs, or physical one as power consumption or execution time).
An important requirement for making a compiler secure is that it must grant that the security properties at the source level are fully preserved into the object level or, equivalently, that all the attacks that can be carried out at the object level can also be carried out at the source level. In this way, it is enough showing that the program is secure at the source level, where reasoning is far more comfortable than at low level!
In this paper we focus on obfuscating compilers designed to protect a software by obscuring its meaning and impeding the reconstruction of its original source code. Usually, the main concern when defining such compilers is their robustness against reverse engineering and the performance of the produced code. Very few papers in the literature address the problem of proving their correctness, e.g., [blazy2016formal], and, to the best our knowledge, there is no paper about the preservation of security policies. Here, we offer a first contribution in this direction: we consider a popular program obfuscation (namely controlflow flattening [laszlo2009obfuscating]) and a specific security policy (namely constanttime), and we prove that every program satisfying the policy still does after the transformation, i.e., the obfuscation preservers the policy.
For the sake of presentation, our source language is rather essential, as well as our illustrative examples. The proof that controlflow flattening is indeed secure follows the approach of [barthe2018secure] (briefly presented in Section 0.2), and only needs paperandpencil on our neat, foundational setting. Intuitively, we prove that if two executions of a program on different secret values are indistinguishable (i.e., they take the same time), then also the executions of its obfuscated version are indistinguishable (Section 0.3).
Actually, we claim that extending our results to a richer language will only require to handle more details with no relevant changes in the structure of the proof itself; similarly, other security properties can be accommodated with no particular effort in this framework, besides those already studied in [barthe2018secure], and also other program transformations can be proved to preserve security in the same manner.
Below, we present the security policy and the transformation of interest.
Constanttime policy
An intruder can extract confidential data by observing the physical behavior of a system, through the socalled sidechannel attacks. The idea is that the attacker can recover some pieces of confidential information or can get indications on which parts are worth her cracking efforts, by measuring some physical quantity about the execution, e.g., power consumption and time. Many of these attacks, called timingbased attacks, exploit the execution time of programs [Kocher96]. For example, if the program branches on a secret, the attacker may restrict the set of values it may assume, whenever the two branches have different execution times and the attacker can measure and compare them. A toy example follows (in a sugared syntax), where a user digits her pin then checked against the stored one character by character: here the policy is violated since checking a correct pin takes longer than a wrong one.
Many mitigations of timingbased attacks have been proposed, both hardware and software. The program counter [MolnarPSW05] and the constanttime [Bernstein05] policies are softwarebased countermeasures, giving rise to the constanttime programming discipline. It makes programs constanttime w.r.t. secrets, i.e., the running times of programs is independent of secrets. The requirement to achieve is that neither the controlflow of programs nor the sequence of memory accesses depend on secrets, e.g., the value of pin in our example. Usually, this is formalized as a form of an information flow policy [GM82] w.r.t. an instrumented semantics that records information leakage. Intuitively, this policy requires that two executions started in equivalent states (from an attacker’s point of view) yield equivalent leakage, making them indistinguishable to an attacker.
The following is a constanttime version of the above program that checks if a pin is correct:
Controlflow flattening
A different securing technique is code obfuscation, a program transformation that aims at hiding the intention and the logic of programs by obscuring (portions of) source or object code. It is used to protect a software making it more difficult to reverse engineer the (source/binary) code of the program, to which the attacker can access. In the literature different obfuscations have been proposed. They range from only performing simple syntactic transformations, e.g., renaming variables and functions, to more sophisticated ones that alter both the data, e.g., constant encoding and array splitting [collberg2010surreptitious], and the control flow of the program, e.g., using opaque predicates [collberg2010surreptitious] and inserting dead code.
Controlflow flattening is an advanced obfuscation technique, implemented in stateoftheart and industrial compilers, e.g., [junod2015obfuscator]. Intuitively, this transformation reorganizes the Control Flog Graph (CFG) of a program by taking its basic blocks and putting them as cases of a selective structure that dispatches to the right case. In practice, CFG flattening breaks each sequences of statements, nesting of loops and ifstatements into single statements, and then hides them in the cases of a large statement, in turn wrapped inside a loop. In this way, statements originally at different nesting level are now put next each other. Finally, to ensure that the control flow of the program during the execution is the same as before, a new variable is introduced that acts as a program counter, and is also used to terminate the loop. The statement dispatches the execution to one of its cases depending on the value of . When the execution of a case of the statement is about to complete is updated with the value of the next statement to executed.
The obfuscated version of our constanttime example follows.
Now the point is whether the new obfuscated program is still constanttime, which is the case. In general we would like to have guarantees that the attacks prevented by the constanttime based countermeasure are not possible in the obfuscated versions.
0.2 Background: CTsimulations
Typically, for proving the correctness of a compiler one introduces a simulation relation between the computations at the source and at the target level: if such a relation exists, we have the guarantee that the source program and the target program have the same observable behavior, i.e., the same set of traces.
A general method for proving that constanttime is also preserved by compilation generalizes this approach and is based on the notion of CTsimulation [barthe2018secure]. It considers three relations: a simulation relation between source and target, and two equivalences, one between source and the other between target computations. The idea is to prove that, given two computations at source level that are equivalent, they are simulated by two equivalent computations at the target level. Actually, CTsimulations guarantee the preservation of a particular form of noninterference, called observational noninterference. In the rest of this section, we briefly survey observational noninterference and how CTsimulations preserve it.
The idea is to model the behavior of programs using a labeled transition system of the form where and are program configurations and represents the leakage associated with the execution step between and . The semantics is assumed deterministic. Hereafter, let the configurations of the source programs be ranged over by and those of the target programs be ranged over by . We will use the dot notation to refer to commands and state inside configurations, e.g., refers to the command part of the configuration .^{1}^{1}1Following the convention of secure compilation, we write in a blue, sansserif font the elements of the source language, in a red, bold one those of the target and in black those that are in common.
The leakage represents what the attacker learns by the program execution. Formally, the leakage is a list of atomic leakages where not cancellable. Observational noninterference is defined for complete executions (we denote the set of final configurations) and w.r.t. an equivalence relation on configurations (e.g., states are equivalent on public variables): [Observational noninterference [barthe2018secure]] A program is observationally noninterferent w.r.t. a relation , written , iff for all initial configurations and configurations and leakages and ,
Hereafter, we denote a compiler/transformation with and with the result of compiling a program . Intuitively, a compiler preserves observational noninterference when for every program that enjoys the property, does as well. Formally, [Secure compiler] A transformation preserves observational noninterference iff, for all programs
To show that a compiler is secure, we follow [barthe2018secure], and build a general CTsimulation in two steps. First we define a simulation, called general simulation, that relates computations between source and target languages. The idea is to consider related a source and a target configuration whenever, after they perform a certain number of steps, they end up in two still related configurations. Formally, [General simulation [barthe2018secure]] Let be a function mapping source and target configurations to . Also, let be a function from source configurations to . The relation is a general simulation w.r.t. whenever:

,

,

For any source configuration and target configuration there exists a target configuration such that .
Given two configurations and in the simulation relation, the function predicts how many steps has to perform for reaching a target configuration related with the corresponding source configuration . When , a possibly infinite sequence of source steps is simulated by an empty one at the target level. To avoid these situations the measure function is introduced and the condition 2 of the above definition ensures that the measure of source configuration strictly decreases whenever the corresponding target one stutters.
The second step consists of introducing two equivalence relations between configurations: relates configurations at the source and at the target. These two relations and the simulation relation form a general CTsimulation. Formally, [General CTsimulation [barthe2018secure]] A pair is a general CTsimulation w.r.t. , and whenever:

is a manysteps CTdiagram, i.e., if

and ;

and ;

and ;

, , and
then

and ;

and ;


if are initial configurations, with targets , and , then and ;

If , then ;

is a final CTdiagram [barthe2018secure], i.e., if

and ;

and are final;

and ;

, , and
then

and ;

and they are both final.

The idea is that the relations and are stable under reduction, i.e., preservation of the observational noninterference is guaranteed. The following theorem, referred to in [barthe2018secure] as Theorem 6, gives a sufficient condition to establish constanttime preservation. [Security] If is constanttime w.r.t. and there is a general CTsimulation w.r.t. a general simulation, then is constanttime w.r.t. .
0.3 Proof of preservation
In this section, we present the proof that controlflow flattening preserves constanttime policy. We first introduce a small imperative language, its semantics in the form of a LTS and our leakage model. Then, we formalize our obfuscation as a function from syntax to syntax, and finally we prove the preservation of the security policy.
0.3.1 The language and its (instrumented) semantics
We consider a small imperative language with arithmetic and boolean expressions. Let be a set program identifiers, the syntax is
We assume that each command in the syntax carries a permanent color either white or not, typically . Also, we stipulate that each statement and all its components get a unique nonwhite color, and that there is a function yielding the color of a statement.
Now, we define the semantics and instantiate the framework of [barthe2018secure] to the noncancelling constanttime policy. For that, we define a leakage model to describe the information that an attacker can observe during the execution. Recall from the previous section that the leakage is a list of atomic leaks. We denote with the list concatenation and with a list with a single element . Arithmetic and boolean expressions leak the sequence of operations required to be evaluated; we assume that there is an observable , associated with the arithmetic operation being executed, but not with the logical ones (slightly simplifying [barthe2018secure]). Also we denote with absence of leaking. Our leakage model is defined by the following function that given an expression (either arithmetic or boolean) and a state returns the corresponding leakage:
Accesses to constants and identifiers leak nothing; boolean and relational expressions leak the concatenation of the leaks of their subexpressions; the arithmetic expressions append the observable of the applied operator to the leaks of their subexpressions.
We omit the semantics of arithmetic and boolean expression because fully standard [nielson2007semantics]; we only assume that each syntactic arithmetic operator has a corresponding semantic operator .
The semantics of commands is given in term of a transition relation between configurations where is the leakage of that transition step. As usual a configuration is a pair consisting of a command and a state assigning values to program identifiers. Given a program the set of initial configurations is , and that of final configurations is .
Figure 1 reports the instrumented semantics of the language. Moreover, the semantics is assumed to keep colors, in particular in the rule for an colored , all the components of the in the target are also colored, avoiding color clashes (see the .pdf for colors).
0.3.2 Controlflow flattening formalization
Recall that the initial program being obfuscated is . For the sake of presentation, we will adopt the sugared syntax we used in Section 0.1 and represent a sequence of nested conditionals in the obfuscated program as the command , where , with semantics (e, c) ∉cs e : cs, σeσ, σ
(e, c) ∈cs e : cs, σeσc, σ
Now, let be a fresh identifier, called program counter. Then, following [blazy2016formal],the obfuscated version of the command is
where
with defined as follows
The obfuscated version of a program is a loop with condition and with body a statement. The condition is on the values of and its cases correspond to the flattened statements, obtained from the function . It returns a list containing the cases of the and it is inductively defined on the syntax of commands: the first parameter is the identifier to use for program counter; the second is the command to be flattened; the parameter represents the value of the guard of the case generated for the first statement of ; the last parameter represents the value to be assigned to by the last case generated. For example, the flattening of a sequence generates the cases corresponding to and , and then concatenates them. Note that the values of the program counter for the cases of start from the value assigned to by the last case generated for , i.e., , where the function returns the “length” of . For a program , we use as initial value of and as last value to be assigned so as to exit from the loop.
0.3.3 Correctness and security
Since obfuscation does not change the language (apart from sugaring nested commands); the operational semantics is deterministic; and there are no unsafe programs (i.e., a program gets stuck iff execution has completed), the correctness of obfuscation directly follows from the existence of a general simulation between the source and the target languages [barthe2018secure]. For that, inspired by [blazy2016formal], we define the relation between source and target configurations shown in Figure 2. Intuitively, the relation matches source and target configurations with the same behaviour, depending on whether they are final (third rule), their execution originated from a loop (Rule (Colored)) or not (Rule (White)). Note that we differentiate white and colored cases as to avoid circular reasoning in the derivations of . More specifically, our relation matches a configuration in the source with a corresponding in the target. Actually, is the loop of the obfuscated program (fourth premise in Rule (White) and third in Rule (Colored)), whereas is equal to except for the value of . Its value is mapped to the case of the corresponding to the next command in (first premise in Rule (White) and fifth in Rule (Colored)).
To understand how our simulation works, recall the example from Section 0.1. By Rule (White) we relate the configuration reached at line at the source level with that of the obfuscated program starting at line and with a state equal to that of the source level with the additional binding . Similarly, we relate the configuration reached at line at the source level and its obfuscated counterpart (again at line at the obfuscated level), using Rule (Colored) and noting that the source configuration derives from the execution of a loop.
The following theorem ensures that the relation is a general simulation. theoremthmgensim For all programs , the relation is a general simulation.
The correctness of the obfuscation is now a corollary of Theorem 0.3.3. [Correctness] For all commands and store
The next step is showing that the controlflow flattening obfuscation preserves the constanttime programming policy.
For that we define below and we show that is a general CTsimulation, as required by Theorem 0.2.
Let and be two (source or obfuscated) configurations, then iff .
We prove the following:
theoremthmctsim
The pair is a general CTsimulation w.r.t. , and .
The main result of our paper directly follows from the theorem above, because the transformation in Section 0.3.2 satisfies Definition 0.2:
[Constanttime preservation]
The controlflow fattening obfuscation preserves the constanttime policy.
The proofs of the theorems above are in the Appendix .5.
0.4 Conclusions
In this paper we applied a methodology from the literature [barthe2018secure] to the advanced obfuscation technique of controlflow flattening and proved that it preserves the constanttime policy. For that, we have first defined what programs leak. Then, we have defined the relation between source and target configurations – that roughly relates configurations with the same behavior – and proved that it adheres to the definition of general simulation. Finally, we proved that the obfuscation preserves constant time by showing that the pair is a general CTsimulation, as required by the framework we instantiated. As a consequence, the obfuscation based on controlflow flattening is proved to preserve the constanttime policy.
Future work will address proving the security of other obfuscations techniques, and considering other security properties, e.g., general safeties or hypersafeties. Here we just considered a passive attacker that can only observe the leakage, and an interesting problem would be to explore if our result and the current proof technique scale to a setting with active attackers that also interferes with the execution of programs. Indeed, recently new secure compilation principles have been proposed to take active attackers into account [abate2018exploring].
Related Work
Program obfuscations are widespread code transformations [laszlo2009obfuscating, collberg2010surreptitious, junod2015obfuscator, tigress, uglifyjs2, binaryen] designed to protect software in settings where the adversary has physical access to the program and can compromise it by inspection or tampering. A great deal of work has been done on obfuscations that are resistant against reverse engineering making the life of attackers harder. However, we do not discuss these papers because they do not consider formal properties of the proposed transformations. We refer the interested reader to [hosseinzadeh2018] for a recent survey.
Since to the best our knowledge, ours is the first work addressing the problem of security preservation, here we focus only on those proposals that formally studied the correctness of obfuscations. In [dallapreda2005control, dallapreda2009semantics] a formal framework based on abstract interpretation is proposed to study the effectiveness of obfuscating techniques. This framework not only characterizes when a transformation is correct but also measures its resilience, i.e., the difficulty of undoing the obfuscation. More recently, other work went in the direction of fully verified, obfuscating compilation chains [blazy2012towards, blazy2016formal, blazy2019formal]. Among these [blazy2016formal] is the most similar to ours, but it only focusses on the correctness of the transformation, and studies it in the setting of the CompCert C compiler. Differently, here we adopted a more foundational approach by considering a core imperative language and proved that the considered transformation preserves security.
As for secure compilation, we can essentially distinguish two different approaches. The first one only considers passive attackers (as we do) that do not interact with the program but that try to extract confidential data by observing its behaviour. Besides [barthe2018secure], recently there has been an increasing interest in preserving the verification of the constant time policy, e.g., a version of the CompCert C compiler [barthe2020formal] has been released that guarantees that preservation of the policy in each compilation step. The second approach in secure compilation considers active attackers that are modeled as contexts in which a program is plugged in. Traditionally, this approach reduces proving the security preservation to proving that the compiler is fullyabstract [patrignani2019formal]. However, recently new proof principles emerged, see [abate2018exploring, patrignani2018robustly] for an overview.
References
.5 Proof
Here we report a proof sketch with that includes the most significant cases.
Before proving the correctness of the obfuscation, we prove the following lemma that relates the termination of the source program with the assignment to the special variable causing the termination of the obfuscated version.
(As above, the th element of the list generated by the obfuscation is referred to as .)
Let be a program with , and .
If then .
Proof.
Easily proved by induction on . ∎
The binary relation in Figure 2 is a general simulation according to Definition 0.2. Before showing that, we first give the definition of , that maps a source and a target configuration into a natural number expressing the number of steps that need to be performed on to reach a configuration which is in relation with the one reached from in one step. [barthe2018secure] In our case the definition is syntax directed and is as follows:
We also define the measure that it used to guarantee that an infinite number of steps at the source level is not matched by a finite number of steps at the target [barthe2018secure, blazy2016formal]:
Note that the measure was built with the specific requirements of the proof of Theorem 0.3.3, i.e., to ensure that .
*
Proof.
(Sketch)

. This proof goes by induction on the rules of the operational semantics. We only consider the most interesting cases, the others being similar. Actually, we consider two base cases and the only inductive one.
Also, note that – by definition of – any configuration related with another must be such that and
for some and .
 Case: .

By definition of we know that:
We have two exhaustive cases, depending on :
 Case: .

By definition of we know that:
Again, we have two exhaustive cases, depending on :

Case . By Rule (White) of we know that , i.e. that .
Since , . We must then show that and, given that since it derivates from , it suffices to show the following facts

and with that directly follows from ;

. The thesis follows from and by definition of since , because , and by choosing ;

directly follows from the hypotheses.


Case . Analogous to the above.

 Case: , .

The induction hypothesis (IHP) reads as follows
and we have to prove that
Again, we have two exhaustive cases, depending on :

Case . Note that it must be since they coincide both on commands (by definition of ) and on the store. Also, by the premises of Rule (Colored) we have and . Since , the operational semantics is deterministic and , we have that . So, since , to prove that , it remains to prove the following:

holds by hypothesis with ;

that follows by (IHP) that guarantees that and by the condition that ensures .


Case . Analogous to the case above.


. By construction of the measure function .

For any final source configuration and obfuscated configuration there exists a final
obfuscated configuration such that . Trivial.
∎
We then show that the controlflow flattening obfuscation preserves the constanttime programming policy. Following [barthe2018secure], we show that is a final CTdiagram w.r.t. , and .
To prove that adheres to the definitions above we need two lemmata: Let be source or target configurations. If , and then .
Proof.
Follows directly by case analysis on (equality follows from Definition 0.3.3). ∎
If , then .
Proof.
This lemma follows from the fact that the function is defined explicitly in the proof of Theorem 0.3.3 and depends just on the syntax of , , , and . ∎
Finally, we can show the following theorem: *
Proof.
(Sketch)

is a manysteps CTdiagram. The definition of manysteps CTdiagrams configurations that, if

and ;

and ;

and ;

, , and
then

and ;

and ;
The equality of and directly follows from the fact that and are syntactically the same by hypothesis and are the obfuscated version of two configurations that generate the same observable . From Lemma .5 we can derive . Finally, Lemma .5 entails the last two theses.


. Follows from the definition of that just requires syntactic equality between configurations.

. Again, follows directly from definition of and of .

is a final CTdiagram. The definition of final CTdiagrams configurations that, if

and ,

and are final,

and ,

, , and
then

and ;

and they are both final.
Since and are final and and , it must be that is in both and . Thus, and terminate with that just include the check of the condition. The other theses can be derived following the same proof structure as above.

∎
Comments
There are no comments yet.