Secure compilation is concerned with ensuring that the security properties at the source level are preserved as they are at the target level or, equivalently, that all the attacks that can be carried out at the target level are also possible at the source level. In this way, it is enough to reason at the source level to rule out attacks at all.
Consider a functional and reactive source language with I/O primitives, but none for communication, and a compiler to a target language that relies on system calls for managing the I/O (on screen, network, etc.). A run of the compiler transforms the source program
into the target program
(we highlight in the elements of the source language, and in the elements of the target language for better readability). Although correct, this compilation does not preserve the security property requiring a program to never send a value on the network, which enjoys in any context – an expression with a single hole. This property still holds when is plugged into a non-evil target context that correctly implements the system calls. Instead things go wrong when the context is evil, i.e. it maliciously implements the system calls. For example, the property is violated when we plug into
Our idea is to provide a method, inspired by translation validation (TV) (pnueli1998translation), that we call secure translation validation (STV). It automatically decides if a compiler preserves a family of hyperproperties of interest, for a given program . STV is carried on at load time and we argue that this is the right time. On the one hand, it is not too early because one typically wants some security guarantees on a module, e.g. a library, before launching a program using it. On the other hand, it is not too late, since executing the same program in different contexts results in different security guarantees.
2. Our proposal
The technique TV checks the correctness of the compilation of a given program , rather than proving the compiler correct for all inputs. Roughly, it works as follows: first, the source and the target languages are endowed with semantics sharing the same observables; then a suitable simulation is defined between the result of the compilation and the corresponding source program: if such a simulation exists, the compiler is correct; finally, an algorithm effectively computes the required simulation, if any. Remarkably, this algorithm gives a fully automatic way of checking the correctness of real compilers (necula2000translation). A tempting approach could be mechanically proving also the security of a compiler by showing (the existence of) a (suitable) simulation between the source and the target program. However, the construction of the required simulation, if any, is undecidable when the program in hand is not finite-state (deng2016securing). Static analysis comes to our rescue and allows us to devise a mechanical (and approximated) procedure to deal with this problem.
More precisely, we proceed as follows. At load time we plug the compiled program into the (target) context, obtaining , the behaviour of which is safely over-approximated by a static analysis. An approximation is a history expression (bartoletti2009local), i.e. a (finite-state) process of a basic process algebra (bergstra1985algebra), whose actions are the observables of the trace semantics of the target and source languages. For example, in the code above the observable of the primitive print will be display.
Once the history expression for is computed, we verify on it if the compilation process broke some of the properties of interest. The actual verification depends on the family of properties we are interested in. A first principle that one might consider is full abstraction (FA) (abadi1999protection):
where and are programs, while and are suitable notions of behavioural equivalence. However, FA has well-known shortcomings (patrignani2017secure) and does not fit well with STV because of the universal quantification over pairs of programs.
Actually, the principles proposed by Abate et al. (abate2018exploring) are more appropriate for STV purposes, in particular robustly safe compilation (RSC):
Intuitively, RSC considers finite traces produced by the compiled program when plugged in a possibly evil context: the compiler preserves all the safety properties iff there exists a context in which the source program also produces the same finite trace. The operator returns the set of the prefixes of the traces of its argument.
We can effectively check this principle by using STV. Indeed, we can get rid of the universal quantifiers on programs and contexts because STV only considers a single program at a time and is performed at load time. Given a program and a context , it suffices then verifying the following
where is the history expression associated with and that associated with .
Since history expressions safely approximate the behaviour of programs, their semantics includes the set of traces of the program they are associated with. Also, since the properties of interest are defined in terms of traces, STV succeeds when a with the desired property can be proved to exist starting from . Note that, since history expressions are processes of a basic process algebra, it is decidable whether a prefix belongs to the semantics of a history expression. However, there is a price to pay in order to have an effective procedure. False negatives may be produced, and we may fail to prove a compilation secure because history expressions over-approximate the behaviour of programs.
To intuitively illustrate the idea, recall the example above and assume to design our history expressions to track the I/O actions of a program. Consider now the history expression associated with plugged into the evil context :
Intuitively it represents that writes on the screen and sends something on the network, or does nothing (). The prefix of has no counterpart in any source level context (recall that the source language cannot perform I/O on the network). So, this program and context combination is rejected by our analysis.
Instead, plugging into the following non-evil context:
results in the history expression
that has an acceptable counterpart in the source contexts. Our property thus holds and security is preserved.
Our approach works also for program optimizations. For example, consider a source program that has a choice between two behaviour, both prefixed by an output on the screen. e.g. a warning to the user. Its history expression will essentially be as follows:
Any optimizing compiler will detect that both branches share the same output, and will factor it out of the choice. The history expression associated with the optimized program will be
Plugging into a non-evil target context results in (with history expression ). It is not difficult finding a source level context such that any prefix of has a counterpart in (with history expression ). The task is easy, because and have the same semantics — in this case equivalence is trivial, while in more complex cases one can use the equational theory over history expressions, which is decidable (bartoletti2009local).
We briefly discussed examples showing how safety property preservation can be effectively checked. We are currently extending STV to deal with safety hyperproperties, and we are confident that also other families of properties can fit our proposal.