An Instrumenting Compiler for Enforcing Confidentiality in Low-Level Code

11/30/2017 ∙ by Ajay Brahmakshatriya, et al. ∙ 0

We present an instrumenting compiler for enforcing data confidentiality in low-level applications (e.g. those written in C) in the presence of an active adversary. In our approach, the programmer marks secret data by writing lightweight annotations on top-level definitions in the source code. The compiler then uses a static flow analysis coupled with efficient runtime instrumentation, a custom memory layout, and custom control-flow integrity checks to prevent data leaks even in the presence of low-level attacks. We have implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC micro-benchmarks for performance, and on larger, real-world applications (including OpenLDAP, which is around 300KLoC) for programmer overhead required to restructure the application when protecting the sensitive data such as passwords. We find that performance overheads introduced by our instrumentation are moderate (average 12 is only about 160 LoC.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

We consider the problem of protecting confidentiality of secret data in applications written in low-level languages like C. Common examples of such applications include web servers that compute with sensitive data such as passwords and private files, medical software that work with private medical records, database query engines, etc.

Such applications often use tools like encryption to protect the sensitive data outside of the application. For example, webservers use SSL to protect the data from a network adversary and databases can store encrypted data on disk. However, in order to enable computation over it, sensitive data must often be present in the clear in the application’s memory. This makes the data vulnerable to logical bugs and exploitable vulnerabilities in the application.

Broadly, this is a problem of information flow control [20] in the presence of active adversaries. In this setting, a program has (conceptually) separated public and private data, and the goal is to ensure that private data does not leak to an unprotected channel. This is a challenging problem because leaks can occur due to many different reasons. To start, the program may have bugs in its logic that accidentally leak private data to a public channel (e.g. network_send (passwd)). Further, low-level languages like C do not provide memory safety or control flow integrity, making it possible for an attacker to actively craft an exploit, hijack control of the application and steal private data. The Heartbleed bug in OpenSSL [5] is one prominent example of a buffer overflow vulnerability that can be exploited to obtain sensitive data, but many other vulnerabilities and attacks have been reported [9, 4, 1, 45, 39, 2].

Existing approaches

for information flow control can be broadly classified into static and dynamic. Static approaches 

[24, 40] label each variable in the program as public or private, and a static analysis, such as a type system, prohibits flows from private variables to public variables. Soundness of the analysis guarantees that the program is information flow secure. While this results in zero runtime overhead, the programmer often has to work hard to pass the conservative static analysis. Furthermore, unless memory- and type-safety, and control flow integrity are added to the compiled code, this approach is not effective against active adversaries.

Safe dialects of C such as CCured [34] and Deputy [17] compile with such guarantees, but they require additional annotations and program restructuring causing significant programming overhead; sometimes even resulting in the programmer giving up on the exercise of using these languages [34, 31]. Further, they attach metadata (such as bounds) to pointers causing backward-compatibility issues. Although techniques such as SoftBound [33] can get past this issue, the runtime overheads are still significant: 82% on average for CCured on SPECINT95 (maximum 115%) and approximately 77% on average for SoftBound on SPEC2000 (maximum 160%).

Dynamic approaches rely on (a) program instrumentation to track the flow of sensitive data as the program executes, and (b) runtime checks to ensure that it does not leak through the public channels. These approaches typically maintain a taint map, that is used to track the taint associated with each memory address. As data is copied from one address to the other, the taint map is updated accordingly. At the public channels, the runtime checks ensure that the taint associated with the output buffers is public. To guard against control flow integrity (CFI) attacks, measures such as shadow stacks, and stack canaries are deployed. While dynamic approaches prevent leaks even against active attacks, and incur only negligible programmer overhead, they have substantial runtime overheads [42, 29, 52, 18], for instance TaintCheck [36] reports up to overhead for CPU-bound applications.

Our approach combines the best of static and dynamic techniques, building on two key insights. First, complete memory-safety is neither sufficient nor necessary for preventing leaks. Like dynamic approaches, we do not rely on memory safety. Second, we use a novel compiler-enforced memory-partitioning scheme to keep the runtime cost under control and avoid fine-grained taint tracking.

We require the programmer to only do minor program refactoring and add private type annotations in top-level functions and globals definitions in the program to indicate private data. The programmer is free to use all of the C language. Our overheads are 12% on average for SPEC-CPU-2006 benchmarks (maximum 24%), small enough to be used in production and significantly better than previous approaches. We highlight our main ideas below.

Flow analysis and a novel memory partitioning scheme.

Our compiler performs standard dataflow analysis, statically propagating taint from global variables and function arguments that have been marked private by the programmer, to detect any information leaks. However, in languages without memory-safety or in the presence of low-level attacks, it is possible that a pointer, which the compiler assumed to be pointing to a public value statically, actually points to a private value at runtime (or viceversa). This can cause a leak. To prevent this eventuality, it is essential to instrument memory reads and writes to check that they read public or private data as expected by the compiler. Our compiler partitions the program’s memory into a contiguous public region and a disjoint contiguous private region. Checks for public or private are then, simple, efficient range checks on pointers. We describe two partitioning schemes, one based on the Intel MPX ISA [6], and the other based on segment registers (§ 3).

Information flow-aware control flow integrity.

Hijacking of a program’s control can be used by an adversary to bypass runtime checks or even jump into the middle of an instruction to execute arbitrary code. We introduce a novel taint-aware CFI scheme that prevents such attacks from leaking private data.

Trusted components.

Our scheme strictly prevents any private data from flowing into public variables. In practice, an application may want to selectively declassify data at specific points. To support such declassification, we allow the programmer to re-factor trusted declassification functions into a separate component, which we call . The remaining untrusted application, in contrast, is called . Code in is not subject to any flow checks and can copy data from the private to the public region. In fact, can be compiled using a vanilla compiler. However, our scheme isolates the untrusted application from by giving its own private stack and heap, and re-using range checks on memory accesses in to prevent them from reading or writing to ’s stack or heap.

This code partition scheme fits in naturally with design principles that have been proposed for building secure enclave applications [48]. Such applications leverage trusted hardware (such as Intel SGX [26]) for protection against a malicious OS. Once ported to our scheme, it was near trivial to further start using enclaves (§ 5.3) and offer protection even against adversaries that may gain root privilege.

ConfLLVM and ConfVerify.

We have implemented our scheme using the LLVM compiler framework [30], called ConfLLVM. ConfLLVM performs flow analysis on annotated code, compiles it using the memory partitioning scheme mentioned above, and outputs a binary instrumented with the required runtime checks.

We have also developed a lightweight static verifier ConfVerify to check that the binary output by our compiler indeed has all the required instrumentation. ConfVerify disassembles the binary (using hints provided by ConfLLVM) and performs dataflow analysis to ensure that there are no data leaks. Importantly, this allows us to remove the compiler from our Trusted Computing Base (§ 5, § 6).


We have evaluated our scheme on standard benchmarks and several applications, both new and existing, the largest of which is OpenLDAP (300,000 LoC). We find that performance overheads introduced by our instrumentation are moderate, and the programmer effort to port OpenLDAP is only about 160 LoC (§ 7).

2. Overview

Threat model

We consider C applications that work with both private and public data. Applications interact with the external world using the network, disk and other channels. They communicate public data in clear, but want to protect the confidentiality of the private data by, for example, encrypting it before sending it out. However, the application could have logical or memory errors, or exploitable vulnerabilities that may cause private data to be leaked out in clear.

The attacker actively interacts with the application by sending inputs that are crafted to trigger any bugs in the application. The attacker can also observe all the external communication of the application. Our goal is to prevent the private data of the application from leaking out in clear. Specifically, we address explicit information flow: i.e., any data directly derived from private data is also treated as private111While our technique currently does not handle implicit flows, we can extend it by disallowing branches on private data, or moving them to the trusted component after a careful audit.. Side-channels (such as execution time and memory-access patterns) are outside the scope of this work.

Our scheme can also be used for integrity protection in a setting where an application computes over trusted and untrusted data. Any data (explicitly) derived from untrusted inputs cannot be supplied to a sink that expects trusted data (Section 7.2 shows an example).

1void handleReq (char *uname, char *upasswd, char *fname,
                char *out, int out_size)
  char passwd[SIZE], fcontents[SIZE];
5  read_password (uname, passwd, SIZE);
  if(!(authenticate (uname, upasswd, passwd))) {
7    return;
9  read_file(fname, fcontents, SIZE);
  //(out_size > SIZE) can leak passwd to out
11  memcpy(out, fcontents, out_size);
  //a bug in the fmt string can print stack contents
13  sprintf(out + SIZE, fmt, "Request complete");
1#define SIZE 512
3int main (int argc, char **argv)
5  ... //variable declarations
  while (1) {
7    n = recv(fd, buf, buf_size);
    parse(buf, uname, upasswd_enc, fname);
9    decrypt(upasswd_enc, upasswd);
    handleReq(uname, upasswd, fname, out,
11              size);
    format(out, size, buf, buf_Size);
13    send(fd, buf, buf_size);
Figure 1. Request handling code for a web server

Sample application.

Consider the code for a web server in  Figure 1. The server receives requests from the user (main:Figure 1), where the request contains the username and a file name (both in clear text), and the encrypted user password. The server decrypts the password and calls the handleReq helper routine that copies the (public) file contents into the out buffer. The server finally prepares the formatted response (format), and sends the response (buf), in clear, to the user.

The handleReq function allocates two local buffers, passwd and fcontents (handleReq:1). It reads the actual user password (e.g., from a database) into passwd, and authenticates the user. On successful authentication, it reads the file contents into fcontents, copies them to the out buffer, and appends a message to it signalling the completion of the request.

The code has several bugs that can cause it to leak the user password. First, at line 1, memcpy will read out_size bytes from fcontents and copy them to out. If out_size is greater than SIZE, this can cause the contents of passwd to be copied to out because an overflow past fcontents would go into the passwd buffer. Second, if the format string fmt in the sprintf call at line 1 contains extra formatting directives, it can print stack contents into out ([47]). The situation is worse if out_size or fmt can be influenced by the attacker.

Our goal is to prevent such vulnerabilities from leaking out sensitive application data. Below we discuss the three main components of our approach.

(a) Partitioning the application into and .

The programmer begins by partitioning the application into untrusted and trusted components, and respectively. is a library that provides a small set of trusted routines to such as: (1) communication interfaces to the external world (e.g. networking, I/O, and other system calls), (2) memory allocation functions (alloc, free, etc.), (3) cryptographic primitives (for valid private-to-public conversions), and (4) possibly a small set of trusted declassification functions. All the other code becomes part of .

A good practice is to contain most of the application logic to and limit to a library of generic routines that can be hardened over time, possibly even verified manually [48].

In the web server example from Figure 1, would consist of: recv, send, read_file (network, I/O), decrypt (cryptographic primitive), and read_passwd (source of sensitive data). The remaining web server code (parse, format, and even sprintf and memcpy) remains in and is not trusted.

(b) Partitioning of memory by ConfLLVM.

The programmer compiles with a compiler of her choice (or uses existing binaries), and with our compiler ConfLLVM.

ConfLLVM partitions the memory of into two regions, one for public data and one for private data, with each region having its own stack and heap. To help the compiler lay out data in these regions, the programmer can annotate the sensitive data in using a new keyword private. We require the programmer to annotate private data only in top-level definitions, i.e., globals, function signatures, and struct definitions, and in the prototypes of all functions exported by to . ConfLLVM infers annotations for locals (§ 5). The annotations within are untrusted. If the programmer adds incorrect annotations (e.g., does not use the private annotation for a global variable that is assigned private data), then either the compiler will reject the program or it will crash at runtime, but the confidentiality of the private data will not be compromised. In contrast, annotations in the prototypes of exported functions are trusted.

Using these annotations, ConfLLVM lays out stack and heap data in their corresponding regions (§ 3). In our example, the untrusted annotated signatures in are:

void handleReq(char *uname, private char *upasswd,
               char *fname, char *out, int out_sz);
int authenticate(char *uname, private char *upass,
                 private char *pass);

ConfLLVM can automatically infer that, for example, passwd is a private buffer. The trusted annotated signatures of against which ConfLLVM compiles are the following:

int recv(int fd, char *buf, int buf_size);
int send(int fd, char *buf, int buf_size);
void decrypt(char *ciphertxt, private char *data);
void read_passwd(char *uname, private char *pass,
                 int size);

(c) Runtime checks.

ConfLLVM instruments with runtime checks. The runtime checks ensure that, (a) the pointers belong to their annotated or inferred regions (e.g. a private char * actually belongs to the private region), (b) does not read or write beyond its own memory (i.e., it does not read or write to memory), and (c) follows a weak form of CFI that prevents circumvention of the checks.

In addition, functions must appropriately check their arguments to ensure that the data passed by has the correct sensitivity label. For example, the read_passwd function would check that the range [passwd, passwd+SIZE-1] falls inside the private memory segment of . (Note that this is different from a buffer-overflow check; need not keep track of the actual size of allocation of the passwd buffer.)

Our scheme requires that code be placed in read-only memory and that the remaining memory not be executable (§ 6). We do not yet support dynamic code generation.

Trusted Computing Base (TCB).

We designed a static verifier, ConfVerify, to confirm that a binary output by ConfLLVM has enough checks in place to guarantee confidentiality (§ 5). ConfVerify guards against bugs in the compiler.

Our TCB, and thus the security of our scheme, does not depend on the untrusted application code and the compiler. We trust the library code and the static verifier. We also trust the operating system and other components running at a privileged level (although, we can potentially make use of trusted hardware like Intel SGX to isolate the memory of our application from the OS [48]).

3. Memory Partitioning Schemes

ConfLLVM uses the programmer-supplied annotations, and with the help of type inference, statically determines the taint of each memory access (§ 5), i.e., for every memory load and store, it knows statically if the address contains private or public data. It is possible for the type-inference to detect a problem (for instance, when a variable holding private data is passed to a method expecting a public argument), in which case, a type error is reported back to the programmer. On successful inference, no data leaks are guaranteed, provided the programmer-supplied annotations on exported functions are correct. We have developed two different schemes for enforcing annotation correctness at runtime, each of which relies on a careful layout of memory. The key idea in each scheme is to contain all private and all public data to their own respective contiguous regions of memory. The rest of this section outlines these two techniques and discusses their relative pros and cons.

(a) Segment scheme
(b) MPX scheme
Figure 4. Memory layout of

MPX scheme.

This scheme relies on the Intel MPX ISA extension [6] and uses the memory layout shown in Figure (b)b. The memory is partitioned into a public region and a private region, each with its own heap, stack and global segments. The ranges of these regions are determined by the values stored in the MPX bound registers bnd0 and bnd1, respectively, but they must be contiguous to each other (bnd0.lower == bnd1.upper). The user selects the maximum stack size OFFSET at compile time (at most ). The scheme maintains the public and private stacks in lock-step: their respective top-of-stack are always at offset OFFSET to each other. The concrete values stored in bnd0 and bnd1 can be determined at load time (§ 6).

(a) A sample procedure (b) Assembly code under segment scheme
(c) Assembly code under MPX scheme
Figure 8. The (unoptimized) assembly generated by ConfLLVM for an example procedure.

Consider the procedure in Figure (a)a. The generated (unoptimized) assembly under the MPX scheme (using virtual registers for simplicity) is shown in Figure (c)c. ConfLLVM automatically infers that x is a private int and places it on the private stack, whereas y is kept on the public stack. The stack pointer rsp points to the top of the public stack. Because the two stacks are kept at constant OFFSET to each other, x is accessed simply as rsp+4+OFFSET. Each memory access is preceded with MPX instructions (bndcu and bndcl) that check their first argument against the (upper and lower) bounds of their second argument.

Segmentation scheme.

x64 memory operands are in the form , where base and index are -bit unsigned registers, scale is a constant with maximum value of , and displacement is a -bit signed constant. The architecture also provides two segment registers fs and gs for the base address computation, i.e. fs:base simply adds fs to the base value222The segment registers also carry a limit but it is unused in x64..

We use these segment registers to store the lower bounds of the public and private memory regions, respectively, and follow the memory layout shown in Figure (a)a. The public and private regions are separated by (at least) GB of guard space (unmapped pages that cause a fault when accessed). The guard sizes are chosen so that any memory operand whose base is prefixed with fs cannot escape the public segment, and any memory operand prefixed with gs cannot escape the private segment.

The segments are each aligned to a GB boundary. The usable space within each segment is also GB. We access the base address stored in a -bit register, say a private value stored in rax, as fs+eax, where eax is the lower 32 bits of rax. Thus, in fs+eax, the lower bits come from eax and the upper 32 bits come from fs (because fs is GB aligned). This additionally implies that the maximum offset within a segment that can access is GB (). This is rounded up to GB for GB alignment, with GB of usable space and GB of guard space. Since the displacement value can be negative, the maximum negative offset is GB, for which we have the guard space below the public segment.

The usable parts of the segments are restricted to GB to avoid translating pointers when control is passed to , thus avoiding the need to change or recompile . Generated code for our example under this scheme is shown in Figure (b)b. The figure uses the convention that e (resp., esp) represents the lower bits of the register r (resp., rsp). The public and private stacks are still maintained in lock-step. Taking the address of a private stack variable requires extra support: the address of variable x in our example is rsp+4+size, where size is the total segment size (GB).

The segmentation scheme has a lower runtime overhead than the MPX scheme as it avoids doing bound-checks. However, it restricts the segment size to GB.

Multi-threading support.

Our scheme supports
multi-threading. All inserted runtime checks (including those in § 4) are thread-safe because they check values of registers. However, we do need to introduce additional support to use thread-local storage (TLS). Typically, TLS is accessed via the segment register gs: the base of TLS is obtained at a constant offset from gs. The operating system takes care of setting gs on a per-thread basis. However, and operate in different trust domains, thus they cannot share the same TLS buffer.

We let continue to use gs for accessing its own TLS. ConfLLVM changes the compilation of to access TLS in a different way. The multiple (per-thread) stacks in are all allocated inside the stack regions; the public and private stacks for each thread are still at a constant offset to each other. Each thread stack is, by default, of maximum size MB and its start is aligned to a MB boundary (configurable at compile time). We keep the per-thread TLS buffer at the beginning of the stack. simply masks the lower -bits of rsp to zeros to obtain the base of the stack and access TLS.

The segment-register scheme further requires switching of the gs register as control transfers between and . We use appropriate wrappers to achieve this switching, however needs to reliably identify the current thread-id when called from (so that cannot force two different threads to use the same stack in ). ConfLLVM achieves this by instrumenting an inlined-version of the _chkstk routine333 to make sure that rsp does not escape its stack boundaries.

4. Information Flow Aware CFI

We design a custom, information-flow aware CFI scheme to ensure that an attacker cannot alter the control flow of to circumvent the instrumented checks and leak sensitive data.

Typical low-level attacks that can hijack the control flow of a program include overwriting the return address, or the targets of function pointers and indirect jumps. Existing approaches use a combination of shadow stacks or stack canaries to prevent overwriting the return address, or use fine-grained taint tracking to ensure that the value of a function pointer is not derived from user (i.e. attacker-controlled) inputs [18, 28, 48]. While these techniques may prevent certain attacks, our goal is purely to ensure confidentiality. Thus, we designed a custom taint-aware CFI scheme.

Our CFI scheme ensures that for each indirect transfer of control: (a) the target address is some valid jump location, i.e., the target of an indirect call is some valid procedure entry, and the target of a return is some valid return site, (b) the register taints expected at the target address match the current register taints (e.g., when the rax register holds a private value then a ret can only go to a site that expected a private return value). Our scheme does not ensure, for instance, that a return matches the previous call. We use a magic-sequence based scheme to achieve this CFI.

CFI for function calls.

We follow the x64 calling convention for Windows that has argument registers and one return register. Our scheme picks two bit sequences and of length each that appear nowhere else in ’s binary. Each procedure in the binary is preceded with a string that consists of followed by a -bit sequence encoding the expected taints of the argument registers and the return register, as per the function signature. Similarly, each valid return site in the binary is preceded by followed by -bit encoding of the taint of the return value register, again according to the callee’s signature. To keep the length of the sequences uniform at

bits, the return site taint is padded with four zeros.

Callee-save registers are also live at function entry and exit and their taints cannot be determined statically by the compiler. ConfLLVM forces their taint to be public by making the caller save and clear all the private-tainted callee-saved registers before making a call. All dead registers (e.g. unused argument registers and caller-saved registers at the beginning of a function) are conservatively marked private to avoid accidental leaks. The -bit instrumented sequences are collectively referred to as magic sequences. We note that our scheme can be extended easily to support other calling conventions.

Consider the following :

private int add (private int x) { return x + 1; }
int incr (int *p, private int x) {
  int y = add (x); *p = y; return *p; }

The compiled code for these functions is instrumented with magic sequences as follows. The taint bits for add are 11111 as its argument x is private, unused argument registers are conservatively treated as private, and its return type is also private. On the other hand, the taint bits for incr are 0111 because its first argument is public, second argument is private, unused argument registers are private, and the return value is public. The sample instrumentation is as shown below:

  ...  ;assembly code for add
  ... ;assembly code of incr
  call foo
  #M_ret#00001# ;private-tainted ret with padded 0s
  ... ;assembly code for rest of incr

Our CFI scheme adds runtime checks using these sequences as follows. Each ret instruction is instrumented to instead fetch the return address and confirm that its target location has followed by the taint-bit of the return register. For our example, the return of add is replaced as follows:

  r1 = pop ;fetch return address
  r2 = #M_ret_inverted#11110# ;we use bitwise nega-
  r2 = not r2                 ;-tion of magic string
  cmp [r1], r2                ;to retain uniqueness
  jne fail                    ;of M_ret in the binary
  r1 = add r1, 8 ;skip magic sequence
  jmp r1 ;return
fail: call __debugbreak

For direct calls, ConfLLVM statically verifies that the register taints match between the call site and the call target. At indirect calls, the instrumentation is similar to that of a ret: check that the target location contains followed by taint bits that match the register taints at the call site.

Indirect jumps.

ConfLLVM does not generate indirect jumps in . Indirect jumps are mostly required for jump-table optimizations, which we currently disable. We can conceptually support them as long as the jump tables are statically known and placed in read-only memory.

The insertion of magic sequences increases code size but it makes the CFI-checking more lightweight than shadow stack schemes. The unique sequences and are created at load time when the entire binary is available (§ 6).

5. ConfLLVM and ConfVerify

We implemented ConfLLVM as part of LLVM [30], currently targeting the Windows x64 platform.

Compiler front-end.

We introduce a new type qualifier, private, in the language that the programmers can use to annotate types. For example, a private integer-typed variable can be declared as private int x, and a (public) pointer pointing to a private integer as private int *p. The struct fields inherit their outermost annotation from the corresponding struct-typed variable. For example, consider a declaration struct st \{ private int *p; \}, and a variable x of type struct st. Then x.p inherits its qualifier from x: if x is declared as private st x;, then x.p is a private pointer pointing to a private integer. This convention ensures that despite the memory partitioning into public and private, structs are still laid out contiguously in the memory in one of the regions.

We modified the Clang [3] frontend to parse the private type qualifier and generate LLVM Intermediate Representation (IR) instrumented with this additional metadata. Once the IR is generated, ConfLLVM runs standard LLVM IR optimizations that are part of the LLVM toolchain. Most of the optimizations work as-is and don’t require any change. Optimizations that change the metadata (e.g. remove-dead-args changes the function signatures), need to be modified. While we found that it is not much effort to modify an optimization, we chose to modify only the most important ones in order to bound our effort. Rest of the optimizations are disabled, though we anticipate that some of them can be supported with further engineering effort.

LLVM IR and type inference.

After all the optimizations are run, our compiler runs a type qualifier inference [21] pass over the IR. This inference pass propagates the type qualifier annotations to local variables, and outputs an IR where all the intermediates are annotated with optional private qualifiers. The inference is implemented using the standard algorithm, where the dataflows in the program generate qualifier subtyping constraints, which are then solved using an SMT solver [19] at the backend. If the constraints are unsatisfiable, an error is reported to the user. We refer the reader to [21] for details of the algorithm.

After type inference, ConfLLVM knows the taint of each memory operand for load and store instructions. With a simple dataflow analysis [12], the compiler statically determines the taint of each register at each instruction.

Register splilling and code generation.

We made the register allocator be taint-aware: when a register is to be spilled on the stack, the compiler appropriately chooses the private or the public stack depending on the taint of the register. Once the LLVM IR is lowered to machine IR, ConfLLVM emits the assembly code inserting all the checks for memory bounds and CFI.

MPX Optimizations.

ConfLLVM optimizes the bounds-checking in the MPX scheme. MPX instruction operands are identical to x64 memory operands, therefore one can check bounds of a complex operand using a single instruction. However, we found that bounds-checking a register is faster than bounds-checking a memory operand (perhaps because using a memory operand requires an implicit lea).

ConfLLVM optimizes the checks to be on a register as much as possible. It reserves MB of space around the public and private regions as guard regions and eliminates the displacement in each memory operand if its absolute value is smaller than . Usually the displacement value is a small constant (for accessing structure fields or doing stack accesses) and this optimization applies to a large degree. Further, by enabling the _chkstk enforcement for the MPX scheme also (§ 3), ConfLLVM eliminates checks on stack accesses altogether because the rsp value is bound to be within the public region (and rsp+OFFSET is bound to be within the private region).

ConfLLVM further coalesces MPX checks within a basic block. Before adding a check, it confirms if the same check was already added previously in the same block without subsequent modifications to its the base or index registers.

5.1. ConfVerify

We have developed ConfVerify to check that a binary produced by ConfLLVM has the required instrumentation in place to guarantee that there are no (explicit) private data leaks. The design goal of ConfVerify is to guard against bugs in ConfLLVM; it is not a general-purpose verifier meant for arbitrary binaries. ConfVerify actually helped us catch bugs in ConfLLVM during its development.

ConfVerify is only 1500 LOC in addition to an off-the-shelf disassembler that it uses for CFG construction444We use the LLVM disassembler. (as compared to MLOC for ConfLLVM), and therefore, much easier to manually audit. It lacks all the complexities of ConfLLVM (such as register allocation, optimizations, etc.), and uses a simple dataflow analysis to check all the flows. Consequently, it provides a higher degree of assurance for the security of our scheme.

ConfVerify requires the unique prefixes of magic sequences (§ 4) as input and uses them to identify procedure entries in the binary. It starts disassembling the procedures and constructs their control-flow graph (CFG). ConfVerify assumes that the binary satisfies CFI, which makes it possible to reliably identify all instructions in a procedure. If the disassembly fails, the binary is rejected. Otherwise, ConfVerify checks its assumptions: that the magic sequences were indeed unique in the procedures identified and that they have enough CFI checks.

Next, ConfVerify performs a dataflow analysis on the constructed CFG to determine the taints of all the registers at each instruction. It starts by picking the taint bits of the magic sequence preceding the procedure (which makes the analysis modular). It looks for MPX checks or the use of segment registers to identify the taints of memory operands; if it cannot find a check in the same basic block, the verification fails. For each store instruction, it checks that the taint of the destination operand matches the taint of the source register. For direct calls, it checks that the expected taints of the arguments, as encoded in the magic sequence at the callee, matches the taints of the argument registers at the callsite (this differs from ConfLLVM which uses functions signatures). For indirect control transfers (indirect calls and ret), ConfVerify confirms that there is a check for the magic sequence at the target site and that its taint bits match the inferred taints for registers. After a call instruction, ConfVerify picks up taint of the return register from the magic sequence (there should be one), marks other all caller-save registers as private, and callee-save registers as public (following ConfLLVM’s convention).

ConfVerify additionally makes sure that a direct or a conditional jump can only go a location in the same procedure. ConfVerify rejects a binary that has an indirect jump, a system call, or if it modifies a segment register. ConfVerify also confirms correct usage of _chkstk to ensure that rsp is kept within stack bounds. When using the segment-register scheme, ConfVerify additionally checks that each memory operand uses the lower -bits of registers.

5.2. Security analysis

We present an informal security argument for ConfVerify, leaving the formal security proofs as future work.

For every program instruction, ConfVerify maintains the invariant that there is no (explicit) data flow from private operands to public operands. To do this, ConfVerify statically computes the taints of the instruction operands.

For registers, ConfVerify computes the incoming and outgoing taint map for each instruction using a data flow analysis. For memory operands, instead of relying on user annotations or pointer analyses, ConfVerify uses the runtime bounds check that precedes the instruction. Thus, at runtime, either a bounds check fails, causing the program to crash, or the memory operands have taints that ConfVerify computes for them.

ConfVerify analysis is modular: it is a per-procedure analysis that starts from the taint summary magic sequence preceding the procedure, and checks for the invariant; the summaries are statically checked at the call sites. Finally, the CFI checks enforce the invariant for indirect control transfers. At runtime, either the CFI checks fail, or the registers have taints as expected by the target program location. Thus, CFI attacks could hijack the control flow, but they cannot subvert the enforced register taints, and hence the invariant.

5.3. Protection against a malicious Operating System

Our scheme, by itself, does not offer protection against an adversary with root privileges. However, our setup lends itself naturally to leverage hardware-isolation technology like Intel SGX [26] for protection against a malicious operating system. SGX allows a user-mode application to create enclaves, which are regions of memory that appear encrypted to the OS. Code executing outside the enclave cannot access memory inside the enclave, although the reverse is allowed.

When using our techniques, an application can stop private data from leaking to the OS by simply mapping the private region of memory inside an enclave. Further, the code for both and , as well as ’s stack and heap, must be inside the enclave. However, any application code that does not require access to private data can be placed outside the enclave and executed without instrumentation on the public stack: ConfLLVM guarantees that corruption of public data or the public stack cannot induce a leak of private data. Further, the hardware guarantees that code outside the enclave cannot touch (private) data inside the enclave.

In our experience, once the - partitioning of code is done, very little work is required to use enclaves (§ 7.2).

6. Toolchain

This section describes the overall flow to launch an application using our toolchain.

Compiling using ConfLLVM.

Recall that the only external functions in the code are functions; all external communication of happens via . The code is compiled with an (auto-generated) stub file, that implements each of these functions as an indirect jump from a table externals, located at a constant position in (e.g. jmp (externals + offset) for the -th function). The table externals is initialized with zeroes at this point, and ConfLLVM links all the files to produce a dll.

The dll is then postprocessed to patch all the references to globals, so that they correspond to the correct (private or public) region. The globals themselves are relocated by the loader. The postprocessing pass also sets the -bit prefix for the magic sequences (used for CFI, § 4). We find these sequences by generating random bit sequences and checking for uniqueness; usually a small number of iterations suffice.

Wrappers for functions.

For each of the functions in ’s interface exported to , we write a small wrapper that: (a) performs the necessary checks for the arguments (e.g. the send wrapper would check that its argument buffer is contained in the public region), (b) copies arguments to ’s stack, (c) switches gs, (d) switches rsp to ’s stack, and (e) calls the corresponding function underneath (e.g. send in libc). On return, it (f) switches gs and rsp back and jumps to in a similar manner as our CFI return instrumentation. Additionally, the wrappers include the magic sequence similar to those in so that the CFI checks in do not fail when calling . These wrappers are compiled with the dll, and the output dll exports the interface functions.

Memory allocation.

To support the public-private memory partitioning of , must offer allocation routines for obtaining memory in the public and private sections, respectively. We created wrappers of malloc in to offer this functionality.

Loading the and dlls.

When loading the and dlls, the loader: populates the externals table in with addresses of the wrapper functions in , relocates the globals in to their respective, private or public regions, sets the MPX bound registers for the MPX scheme or the segment registers for the segment-register scheme, and initializes the heaps and stacks in all the regions, marks them non-executable, and jumps to the main routine.

7. Evaluation

The goal of our evaluation is three-fold: (a) Quantify the performance overheads of ConfLLVM’s instrumentation, both for enforcing bounds and for enforcing CFI; (b) Quantify necessary code changes in existing applications to enable them to be compiled by ConfLLVM; (c) Check that our scheme actually stops confidentiality exploits in applications.

7.1. CPU benchmarks

We measured the overheads of ConfLLVM’s instrumentation on the standard SPEC CPU 2006 benchmarks [10]. We treat the code of the benchmarks as untrusted (in ), but libc functions are trusted (in ). Since these benchmarks use no private data, we added no annotations to the benchmarks, which makes all data public by default. Nonetheless, the code emitted by ConfLLVM ensures that all memory accesses are actually in the public region, it enforces CFI, and switches stacks when calling functions, so this experiment accurately captures ConfLLVM’s overheads. We ran the benchmarks in the following configurations.

  • Base: Benchmarks compiled with vanilla LLVM, with O2 optimizations. This is the baseline for evaluation.555O2 is the standard optimization level for performance evaluation. Higher levels include “optimizations” that don’t always speed up the program.

  • : ConfLLVM requires the use a customized memory allocator since the Windows system allocator does not support multiple heaps. The configuration is benchmarks compiled with vanilla LLVM but running with our custom allocator.

  • : Compiled with ConfLLVM, but without any runtime instrumentation. However, all optimizations unsupported by ConfLLVM are disabled and stacks are switched in calling functions from .

  • : Like but additionally with CFI instrumentation, but no memory bounds enforcement.

  • : Full ConfLLVM, using MPX scheme.

  • : Full ConfLLVM, using segmentation scheme.

Briefly, the difference between and is the cost of our CFI instrumentation. The difference between (resp. ) and is the cost of enforcing bounds using MPX (resp. segment registers).

All benchmarks were run on a Microsoft Surface Pro-4 Windows 10 machine with an Intel Core i7-6650U 2.20 GHz 64-bit processor with 2 cores (4 logical cores) and 8 GB RAM. Table 9

shows the results, averaged over ten runs. The numbers in the first column are run times in seconds. The remaining columns report percentage changes, relative to the first column. The standard deviations were all below 3%.

The overhead of ConfLLVM using MPX () is up to 74.03%, while that of ConfLLVM using segmentation () is up to 24.5%.666The overheads of MPX may seem high, especially given that MPX was designed to make memory-bounds checking efficient. However, Oleksenko et al.’s recent, systematic investigation of MPX overheads has found very similar overheads [37]. As expected, the overheads are almost consistently significantly lower when using segmentation than when using MPX. Looking further, some of the overhead (up to 10.2%) comes from CFI enforcement (), although the average CFI overhead is , competitive with best known techniques [18]. Stack switching and disabled optimizations () account for the remaining overhead. The overhead due to our custom memory allocator () is negligible and, in many benchmarks, the custom allocator improves performance.

 Name Base(s)
 gcc 272.36 -5.92% 0.17% 3.37% 5.48% 12.32%
 gobmk 395.62 0.08% 4.32% 13.66% 20.90% 33.13%
 hmmer 128.63 0.16% -2.54% -2.50% 1.16% 65.09%
 h264ref 365.13 -0.63% 6.96% 17.12% 22.20% 74.03%
 lbm 208.80 0.82% 6.78% 7.06% 10.63% 11.84%
 bzip2 406.33 -0.47% 3.39% 4.62% 10.19% 40.95%
 mcf 280.06 1.32% 2.11% 4.17% 3.69% 4.02%
 milc 438.60 -8.81% -7.61% -7.17% -6.40% -3.22%
 libquantum 263.8 2.76% 3.831% 6.25% 17.89% 40.75%
 sjeng 424.46 0.01% 12.61% 19.75% 24.5% 48.9%
 sphinx3 438.6 0.93% 1.74% 5.26% 9.93% 30.30%
Figure 9. SPEC CPU 2006 benchmarks in different configurations. Percentages are overheads relative to the first column.

We further comment on some seemingly odd results. On mcf, the cost of CFI alone (

, 4.17%) seems to be higher than that of the full MPX-based instrumentation (

, 4.02%). We verified that this is due to an outlier in

experiment. On hmmer, the overhead of is negative because the optimizations that ConfLLVM disables actually slow it down. Finally, on milc, the overhead of ConfLLVM is negative because this benchmark benefits significantly from the use of our custom memory allocator. Indeed, relative to , the remaining overheads follow expected trends.

7.2. Simple applications

Next, we consider two simple, end-to-end applications and measure specific aspects of performance on them.

A mongoose-based web server

We consider a simple, single-threaded web server that uses, and ships as part of, the Mongoose embedded web server library [8] v6.7. This standard library provides all the useful functions; and the web server is a thin wrapper around it. We modified the library and the web server to serve public and private text files, identified by the extensions ‘.txt’ and ‘.ptxt’, respectively. The server encrypts private files prior to transmission. To read data from a private file, we wrote a trusted wrapper function read_priv around the standard library function read, which reads a file. The new function read_priv behaves exactly like read, but returns the data read in a private buffer, which is passed as the second argument.

size_t read_priv(int fd, private char* buffer,
                 size_t to_read);

We also wrote an encryption function in that takes a private buffer with plain text and returns a public buffer with cipher text. This function is used to encrypt private files read by read_priv. The rest of the server, including the Mongoose library, remains in . This ensures that private data returned via read_priv cannot be leaked unencrypted. Overall, we changed 5 LoC in the Mongoose library code, and 20 lines in the web server code.

We ran performance (throughput) measurements on this simple web server, by compiling it with avanilla LLVM and with ConfLLVM, using segmentation for enforcing bounds. In both cases, we the same machine that we used for the SPEC CPU 2006 benchmarks.

We created 6 processes on the server machine, each running one instance of the server, and connected to them using 36 parallel client threads on a different machine, over a 1Gbps direct link. The clients request medium-sized (16KB) files in succession. All files are already cached in memory. This setup suffices to saturate the server’s CPU in both the baseline and with ConfLLVM. Over 10 runs, the baseline is able to handle, on average 6,703 requests per second (std. dev. 2.1%), while the server compiled with ConfLLVM (segmentation) is able to handle 6,629 requests per second (std. dev. 2.3%). This amounts to a throughput degradation of approximately 1.1%, which is within the experimental error. The reason for this extremely low overhead is that while contains the most complex and error-prone parts of the web server (like the http request parser), the dominant cost is that of copying data in read/write libraries (which are in ) and in the network stack, neither of which is affected by our instrumentation. Consequently, the overheads are insignificant.

Enclaves. We used the Mongoose web server to further test our design in combination with SGX enclaves as described in Section 5.3. It only required a day’s worth of additional effort to use Intel’s SDK for SGX [27] and run the web server inside an enclave (the SDK now becomes part of ). The private memory segment was mapped inside the enclave, supported by the SDK’s memory allocator, whereas the public stack and heap were located outside. The use of enclaves reduced the throughput by 44%, due to the cost of switching control in and out of the enclave to make system calls (which happen frequently in a web server). However, this additional overhead is orthogonal to our techniques and can be reduced with further engineering effort.

Isolating system library services

Our design enables implementing a system service like a file system as a shared userspace library, and isolating it from an untrusted client program in the same address space. As an example, we wrote a small multithreaded, shared userspace library that offers file read and write functions. Internally, the library memory-maps open files, and serves and updates file contents through memory copy. The library also provides integrity by maintaining a Merkle hash tree of the file system’s contents, at the granularity of file system blocks (512 bytes).

The obvious security concern is that malicious or buggy applications may clobber the hash tree and nullify the integrity guarantees. To alleviate this concern we compile both the library and its clients using ConfLLVM (i.e. as part of ). All data within the client is marked private, while the integrity-sensitive datastructures of the library, including the hash tree, are marked public (the library can also work with the private data). While marking the library data public and the client data private may sound backwards, note that the goal is here is to protect the integrity, not the confidentiality, of library data. Marking the data this way prevents compromised clients from clobbering library data structures. It also prevents the library from accidentally copying client data into its own data structures. Finally, we also prevent clients from calling internal library functions directly (for this, we use different magic sequences for the client and the library). Client calls to the library functions are mediated through wrapper functions in , that switch back and forth between the library’s public stack and the client’s stack777An alternative design could be to move the entire library to , while keeping the application in . Compared to our design, this design has the disadvantage that it does not prevent bugs in the library from accidentally copying application data into its critical data structures..

We experiment with this library on a Windows 10 machine with an Intel i7-6700 CPU (4 cores, 8 hyperthreaded cores) and 32 GB RAM. Our client program creates between 1 and 6 parallel threads, all of which read a 2 GB file concurrently. The file is memory-mapped within the library and cached previously. This gives us a CPU-bound workload. We measure the total time taken to perform the reads in three configurations: Base, and . Table 10 shows the total runtime (in seconds) as a function of the number of threads and the configuration, averaged across 5 runs. The standard deviations are negligible, all below 3%. Until the number of threads exceeds the number of cores (4), the time and relative overhead of our scheme remain nearly constant. This establishes linear scaling with the number of threads. The actual overhead of is below 10% and that of is below 17% in this experiment.

Threads Base
1 16.0 17.5 (9.38%) 18.6 (16.25%)
2 16.4 18.0 (9.76%) 19.1 (16.46%)
4 17.5 19.1 (9.14%) 20.2 (15.43%)
6 26.1 28.7 (9.96%) 30.5 (16.85%)
Figure 10. Total time, in seconds, for reading a 2GB file in parallel, as a function of the number of threads. Numbers in parenthesis are overheads relative to Base.

7.3. OpenLDAP

In order to show that ConfLLVM can scale to large applications, we use it to compile the OpenLDAP project [38]. OpenLDAP is an open source implementation of the Lightweight Directory Access Protocol (LDAP) proposed by IETF RFC 4511 [44]. LDAP provides a standardized way of organizing information in an application-defined hierarchical structure, as well as accessing the information over a network. We use OpenLDAP version 2.4.45. This version has 300,000 lines of C code, across 728 source files. We configure OpenLDAP as a multi-threaded server (the default) with a memory-mapped backing store (also the default), and simple username/password authentication.

By default, OpenLDAP stores two types of passwords: 1) A password for each user in the database; authenticating with this password provides access to specific parts of the database, and 2) A root password that gives access to the entire backing store. We modified the source code to protect both types of passwords. This was achieved by modifying 100 lines of code, adding 52 lines of new code, and moving 9 lines of existing code into . In total, these constitute about 0.5% of the existing code base.

We briefly describe our code changes and additions. In OpenLDAP, the root password is stored in a configuration file that is read during initialization. To protect the root password, we moved the root password to a different location and added a function to read the password from this location into a private buffer. Subsequently, ConfLLVM guarantees that this password cannot leak to the public region. To protect user passwords, we changed OpenLDAP to encrypt these passwords when they are written to the backing store and to decrypt them when they are read back. The functions to encrypt and decrypt are in . Importantly, the decryption function returns the decrypted password in a private buffer. ConfLLVM then prevents this password from leaking.

We perform two throughput experiments on OpenLDAP compiled with ConfLLVM. We use MPX for bounds checks, as opposed to segmentation, since we know from the SPEC CPU benchmarks that MPX is worse for ConfLLVM. We use the same machine as for the SPEC CPU benchmarks to host an OpenLDAP server configured to run 6 concurrent threads. The server is pre-populated with 10,000 random directory entries and its caches are warmed ahead of time.

In the first experiment, 80 concurrent clients, running on a separate machine connected to the server over a 100Mbps direct Ethernet link, issue concurrent requests for directory entries that do not exist. Across three trials, the server handles on average 26,254 and 22,908 requests per second in the baseline (Base) and ConfLLVM using MPX (). This corresponds to a throughput degradation of 12.74%, which is moderate. The server CPU remains nearly saturated throughout the experiment. The standard deviations are very small (1.7% and 0.2% in Base and , respectively).

Our second experiment is identical to the first, except that 60 concurrent clients issue small requests for entries that do exist on the server. Now, the baseline and ConfLLVM using MPX handle 29,698 and 26,895 queries per second, respectively. This is a throughput degradation of 9.44%. The standard deviations are small (less than 0.2% in both cases).

The reason for the difference in overheads in these two experiments is that OpenLDAP does less work (in ) looking for directory entries that exist than it does looking for directory entries that don’t exist. Although the overheads are already moderate, we expect they can be reduced further by using segmentation in place of MPX.

7.4. Vulnerability-injection experiments

To test that ConfLLVM actually stops data extraction vulnerabilities from being exploited, we hand-crafted vulnerabilities in three applications. First, in the Mongoose-based web server of Section 7.2, we added a buffer-bounds vulnerability to the code path for serving a public file. The vulnerability transmits any amount of stale data from the stack, unencrypted. We wrote a client that exploits the vulnerability by first requesting a private file, which causes some contents of the private file to be written to the stack in clear text, and then requesting a public file with the exploit. This causes stale private data from the first request to be leaked. Using ConfLLVM stops the exploit since ConfLLVM separates the private and public stacks. The contents of the private file are written to the private stack but the vulnerability reads from the public stack.

Second, we modified Minizip [7], a file compression tool, to explicitly leak the file encryption password to a log file. ConfLLVM’s type inference detects this leak once we annonate the password as private. To make it harder for ConfLLVM, we added several pointer type casts on the password, which make it impossible to detect the leak statically. But then, the dynamic checks inserted by ConfLLVM prevent the leak.

Third, we wrote a simple function with a standard format string vulnerability in its use of printf. printf is a vararg function, whose first argument, the format string, determines how many subsequent arguments the function tries to print. If the format string has more directives than the number of arguments, potentially due to adversary provided input, printf ends up reading other data from either the argument registers or the stack. If any of this data is private, it results in a data leak. ConfLLVM prevents this vulnerability from being exploited if we include printf’s code in : since printf tries to read all arguments into buffers marked public, the bounds enforcement of ConfLLVM prevents it from reading any private data.

8. Related Work

Our work bears similarities to information release confinement (IRC) [48] that proposed a design methodology for programming secure enclaves (e.g., ones that use Intel SGX instructions for memory isolation). The code that is loaded inside an enclave is divided into and , where the former is obtained via a special instrumenting compiler [16]. This work argued for a minimal amount of trusted code and sandboxed to restrict all its communication to happen via . This is similar in principle to our - division. However, there are several differences. First, IRC does not track taints and all data is encrypted by before releasing it externally. Thus, the application is incapable of carrying out plain-text communication without losing the security guarantee. Second, their implementation does not support multi-threading (it relies on page protection to isolate from ). Third, it maintains a bitmap of writeable memory locations for enforcing CFI (resulting in time and memory overheads). Our CFI is taint aware and without these overheads. Finally, their verifier does not scale even for SPEC benchmarks, whereas our verifier is much faster and shown to scale to all binaries that we have tried so far, including SPEC benchmarks.

In a parallel effort to ours, Carr et al. [15] present DataShield with similar goals to ConfLLVM. DataShield, however, trusts the user-annotations and the compiler, whereas in our work, the user-annotations are untrusted and ConfVerify removes ConfLLVM from the TCB. Unlike ConfLLVM, DataShield does not protect against control-flow hijacking attacks; it assumes CFI enforcement through some other technique. Our CFI scheme is central to designing a modular verifier ConfVerify. DataShield maintains and checks pointer bounds at the object level for sensitive data, thereby providing integrity for sensitive data, which is outside the goals of ConfLLVM. We have evaluated ConfLLVM on a much larger benchmark (OpenLDAP, 300 KLOC) than DataShield (mbedTLS, 30 KLOC). Further, by making sure that there is no network or disk IO involved, our reported overheads reflect the true cost of the instrumentation whereas DataShield’s evaluation had IO, which masked their overheads, as reported by the authors.

Region-based memory partitioning has been explored before in the context of safe and efficient memory management [50, 22], but our use of private and public regions in is the first application of regions to enforce confidentiality. The regions obviate the use of dynamic taint tracking, which has been the subject of much research in recent years [46, 32, 41]. TaintCheck [36] first proposed the idea of dynamic taint tracking, and is used as the basis for the dynamic binary rewriting tool Valgrind [35]. DECAF [25] is a whole system binary analysis framework for program analysis that includes a taint tracking mechanism. However, whole system dynamic taint trackers incur heavy performance overhead. For instance, DECAF has an average 600% overhead, and TaintCheck can impose a x performance hit for CPU-bound applications. Suh et al. [49] report sub overheads for their dynamic information flow tracking scheme, but unlike our scheme, they require a custom hardware. While most dynamic taint tracking systems handle explicit flow of information, [51] proposes a binary translation scheme to convert all implicit flows into explicit flows.

Static analyses for security [13, 14, 23, 43] use source code to try to prove some correctness criteria, e.g. safe downcasts in C++, or correct use of variadic arguments. When proofs are not possible, runtime checks are inserted to enforce relevant security policy at runtime. In ConfLLVM, it is not always possible to determine statically whether an address belongs to the private or public partition, so we add runtime checks.

Our CFI mechanism is similar to Abadi et al. [11] and Zeng et al. [53] in its use of magic sequences, but our magic sequences are taint aware. As already mentioned in Section 1, memory safety techniques for C such as CCured [34] and SoftBound [33] do not provide confidentiality and already have much higher overheads than ConfLLVM. Adding confidentialtity on top of them is bound to add even more programmer or runtime overhead. Techniques such as Code-pointer integrity [28] prevent control flow hijacks but not data leaks. And so, their problem statement and even the details of the solution are very different from ConfLLVM.


  • [1] 2011 CWE/SANS Top 25 Most Dangerous Software Errors.
  • [2] Chrome owned by exploits in hacker contests, but google’s $1m purse still safe.
  • [3] Clang: A C language family frontend for LLVM.
  • [4] Cve-2012-0769, the case of the perfect info leak.
  • [5] The heartbleed bug.
  • [6] Intel mpx explained.
  • [7] Minizip.
  • [8] Mongoose.
  • [9] Smashing the stack for fun and profit.
  • [10] SPEC CPU 2006.
  • [11] Martín Abadi, Mihai Budiu, Ulfar Erlingsson, and Jay Ligatti. Control-flow integrity. In Proceedings of the 12th ACM conference on Computer and communications security, pages 340–353. ACM, 2005.
  • [12] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1986.
  • [13] Priyam Biswas, Alessandro Di Federico, Scott A. Carr, Prabhu Rajasekaran, Stijn Volckaert, Yeoul Na, Michael Franz, and Mathias Payer. Venerable variadic vulnerabilities vanquished. In 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, 2017. USENIX Association.
  • [14] Fraser Brown, Andres Nötzli, and Dawson Engler. How to build static checking systems using orders of magnitude less code. SIGPLAN Not., 51(4):143–157, March 2016.
  • [15] Scott A. Carr and Mathias Payer. Datashield: Configurable data confidentiality and integrity. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS ’17, pages 193–204, New York, NY, USA, 2017. ACM.
  • [16] Miguel Castro, Manuel Costa, Jean-Philippe Martin, Marcus Peinado, Periklis Akritidis, Austin Donnelly, Paul Barham, and Richard Black. Fast byte-granularity software fault isolation. In Symposium on Operating Systems Principles (SOSP), 2009.
  • [17] Jeremy Condit, Matthew Harren, Zachary Anderson, David Gay, and George C. Necula. Dependent types for low-level programming. In Proceedings of the 16th European Symposium on Programming, ESOP’07, pages 520–535, Berlin, Heidelberg, 2007. Springer-Verlag.
  • [18] Thurston H.Y. Dang, Petros Maniatis, and David Wagner. The performance cost of shadow stacks and stack canaries. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, ASIA CCS ’15, pages 555–566, New York, NY, USA, 2015. ACM.
  • [19] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag.
  • [20] Dorothy E. Denning. A lattice model of secure information flow. Commun. ACM, 19(5):236–243, May 1976.
  • [21] Jeffrey S. Foster, Manuel F"ahndrich, and Alexander Aiken. A Theory of Type Qualifiers. In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 192–203, Atlanta, Georgia, May 1999.
  • [22] Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. Region-based memory management in cyclone. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 282–293, New York, NY, USA, 2002. ACM.
  • [23] Istvan Haller, Yuseok Jeon, Hui Peng, Mathias Payer, Cristiano Giuffrida, Herbert Bos, and Erik van der Kouwe. Typesan: Practical type confusion detection. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, pages 517–528, New York, NY, USA, 2016. ACM.
  • [24] Nevin Heintze and Jon G. Riecke. The slam calculus: Programming with secrecy and integrity. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’98, pages 365–377, New York, NY, USA, 1998. ACM.
  • [25] A. Henderson, L. K. Yan, X. Hu, A. Prakash, H. Yin, and S. McCamant. Decaf: A platform-neutral whole-system dynamic binary analysis platform. IEEE Transactions on Software Engineering, 43(2):164–184, Feb 2017.
  • [26] Intel Software Guard Extensions Programming Reference. Available at, 2014.
  • [27] Intel Software Guard Extensions SDK. Available at, 2016.
  • [28] Volodymyr Kuznetsov, László Szekeres, Mathias Payer, George Candea, R. Sekar, and Dawn Song. Code-pointer integrity. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14, pages 147–163, Berkeley, CA, USA, 2014. USENIX Association.
  • [29] Lap Chung Lam and Tzi-cker Chiueh. A general dynamic information flow tracking framework for security applications. In Proceedings of the 22Nd Annual Computer Security Applications Conference, ACSAC ’06, pages 463–472, Washington, DC, USA, 2006. IEEE Computer Society.
  • [30] Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO ’04, pages 75–, Washington, DC, USA, 2004. IEEE Computer Society.
  • [31] Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. Bugbench: Benchmarks for evaluating bug detection tools. In In Workshop on the Evaluation of Software Defect Detection Tools, 2005.
  • [32] Jiang Ming, Dinghao Wu, Jun Wang, Gaoyao Xiao, and Peng Liu. Straighttaint: Decoupled offline symbolic taint analysis. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, pages 308–319, New York, NY, USA, 2016. ACM.
  • [33] Santosh Nagarakatte, Jianzhou Zhao, Milo M. K. Martin, and Steve Zdancewic. SoftBound: highly compatible and complete spatial memory safety for C. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, Dublin, Ireland, June 15-21, 2009, pages 245–258, 2009.
  • [34] George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak, and Westley Weimer. Ccured: type-safe retrofitting of legacy software. ACM Trans. Program. Lang. Syst., 27(3):477–526, 2005.
  • [35] Nicholas Nethercote and Julian Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. SIGPLAN Not., 42(6):89–100, June 2007.
  • [36] James Newsome and Dawn Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. 2005.
  • [37] Oleksii Oleksenko, Dmitrii Kuvaiskii, Pramod Bhatotia, Pascal Felber, and Christof Fetzer. Intel MPX explained: An empirical study of intel MPX and software-based bounds checking approaches. CoRR, abs/1702.00719, 2017.
  • [38] OpenLDAP Project. Openldap.
  • [39] Ryan Roemer, Erik Buchanan, Hovav Shacham, and Stefan Savage. Return-oriented programming: Systems, languages, and applications. ACM Trans. Inf. Syst. Secur., 15(1):2:1–2:34, March 2012.
  • [40] A. Sabelfeld and A. C. Myers. Language-based information-flow security. IEEE J.Sel. A. Commun., 21(1):5–19, September 2006.
  • [41] D. Schoepe, M. Balliu, B. C. Pierce, and A. Sabelfeld. Explicit secrecy: A policy for taint tracking. In 2016 IEEE European Symposium on Security and Privacy (EuroS P), pages 15–30, March 2016.
  • [42] Edward J Schwartz, Thanassis Avgerinos, and David Brumley. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In Security and privacy (SP), 2010 IEEE symposium on, pages 317–331. IEEE, 2010.
  • [43] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. Addresssanitizer: A fast address sanity checker. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC’12, pages 28–28, Berkeley, CA, USA, 2012. USENIX Association.
  • [44] J. Sermersheim. Lightweight Directory Access Protocol (LDAP): The Protocol. RFC 4511, June 2006.
  • [45] Hovav Shacham. The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86). In Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS ’07, pages 552–561, New York, NY, USA, 2007. ACM.
  • [46] Zhiyong Shan. Suspicious-taint-based access control for protecting OS from network attacks. CoRR, abs/1609.00100, 2016.
  • [47] Umesh Shankar, Kunal Talwar, Jeffrey S. Foster, and David Wagner. Detecting format string vulnerabilities with type qualifiers. In Proceedings of the 10th Conference on USENIX Security Symposium - Volume 10, SSYM’01, Berkeley, CA, USA, 2001. USENIX Association.
  • [48] Rohit Sinha, Manuel Costa, Akash Lal, Nuno P. Lopes, Sriram K. Rajamani, Sanjit A. Seshia, and Kapil Vaswani. A design and verification methodology for secure isolated regions. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, pages 665–681, 2016.
  • [49] G. Edward Suh, Jae W. Lee, David Zhang, and Srinivas Devadas. Secure program execution via dynamic information flow tracking. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XI, pages 85–96, New York, NY, USA, 2004. ACM.
  • [50] Mads Tofte and Jean-Pierre Talpin. Region-based memory management. Inf. Comput., 132(2):109–176, February 1997.
  • [51] Neil Vachharajani, Matthew J Bridges, Jonathan Chang, Ram Rangan, Guilherme Ottoni, Jason A Blome, George A Reis, Manish Vachharajani, and David I August. Rifle: An architectural framework for user-centric information-flow security. In Microarchitecture, 2004. MICRO-37 2004. 37th International Symposium on, pages 243–254. IEEE, 2004.
  • [52] Wei Xu, Sandeep Bhatkar, and R. Sekar. Taint-enhanced policy enforcement: A practical approach to defeat a wide range of attacks. In Proceedings of the 15th Conference on USENIX Security Symposium - Volume 15, USENIX-SS’06, Berkeley, CA, USA, 2006. USENIX Association.
  • [53] Bin Zeng, Gang Tan, and Greg Morrisett. Combining control-flow integrity and static analysis for efficient and validated data sandboxing. In Proceedings of the 18th ACM conference on Computer and communications security, pages 29–40. ACM, 2011.