Confidential Attestation: Efficient in-Enclave Verification of Privacy Policy Compliance

07/20/2020 ∙ by Weijie Liu, et al. ∙ Shanghai Jiao Tong University Indiana University Bloomington Indiana University Rice University 0

A trusted execution environment (TEE) such as Intel Software Guard Extension (SGX) runs a remote attestation to prove to a data owner the integrity of the initial state of an enclave, including the program to operate on her data. For this purpose, the data-processing program is supposed to be open to the owner, so its functionality can be evaluated before trust can be established. However, increasingly there are application scenarios in which the program itself needs to be protected. So its compliance with privacy policies as expected by the data owner should be verified without exposing its code. To this end, this paper presents CAT, a new model for TEE-based confidential attestation. Our model is inspired by Proof-Carrying Code, where a code generator produces proof together with the code and a code consumer verifies the proof against the code on its compliance with security policies. Given that the conventional solutions do not work well under the resource-limited and TCB-frugal TEE, we propose a new design that allows an untrusted out-enclave generator to analyze the source code of a program when compiling it into binary and a trusted in-enclave consumer efficiently verifies the correctness of the instrumentation and the presence of other protection before running the binary. Our design strategically moves most of the workload to the code generator, which is responsible for producing well-formatted and easy-to-check code, while keeping the consumer simple. Also, the whole consumer can be made public and verified through a conventional attestation. We implemented this model on Intel SGX and demonstrate that it introduces a very small part of TCB. We also thoroughly evaluated its performance on micro- and macro- benchmarks and real-world applications, showing that the new design only incurs a small overhead when enforcing several categories of security policies.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Recent years have witnessed the emergence of hardware trusted execution environments (TEEs) that enable efficient computation on untrusted platforms. A prominent example is Intel SGX (McKeen et al., 2013), a TEE widely deployed on commercial-off-the-shelf (COTS) desktops and server processors, providing secure memory called enclave to host confidential computing on sensitive data, which are protected from the adversary in control of the operating system and even with physical access to the data-processing machine. Such a computing model has already been supported by major cloud providers today, including Microsoft Azure and Google Cloud (Russinovich, 2017; asy, 2019), and its further adoption has been facilitated by the Confidential Computing Consortium (ccc, 2019), a Linux Foundation project that brings together the biggest technical companies such as Intel, Google, Microsoft and IBM etc. However, before TEEs can see truly wide deployment for real-world confidential computing, key technical barriers still need to be overcome, remote attestation in particular.

Remote attestation. At the center of a TEE’s trust model is remote attestation (RA), which allows the user of confidential computing to verify that the enclave code processing her sensitive data is correctly built and operates on a genuine TEE platform, so her data is well protected. This is done on SGX through establishing a chain of trust rooted at a platform attestation key owned by the hardware manufacturer and using the key to generate a Quote – a signed report that contains the measurement of the code and data in an enclave; the Quote is delivered to the data owner and checked against the signature and the expected measurement hash. This trust building process is contingent upon the availability of the measurement, which is calculated from the enclave program either by the data owner when the program is publicly available or by a trusted third party working on the owner’s behalf. This becomes problematic when the program itself is private and cannot be exposed. Programs may have exploitable bugs or they may write information out of the enclave through corrupted pointers easily. For example, different banks and financial agencies would like to jointly calculate a person’s credit score based on each other’s data, without disclosing their individual data and the proprietary algorithm processing it. As another example, the pharmaceutical companies want to search for suitable candidates for their drug trial without directly getting access to plaintext patient records or exposing their algorithm (carrying sensitive genetic markers discovered with million dollar investments) to the hospital. With applications of this kind on the rise, new techniques for protecting both data and code privacy are in great demand.

Confidential attestation: challenges. To address this problem, we present in this paper a novel Confidential ATtestation (CAT) model to enable verification of an enclave program’s compliance with user-defined security policies without exposing its source or binary code to unauthorized parties involved. Under the CAT model, a bootstrap enclave whose code is public and verifiable through the Intel’s remote attestation, is responsible for performing the compliance check on behalf of the participating parties, who even without access to the code or data to be attested, can be convinced that desired policies are faithfully enforced. However, building a system to support the CAT model turns out to be nontrivial, due to the complexity in static analysis of enclave binary for policy compliance, the need to keep the verification mechanism, which is inside the enclave’s trusted computing base (TCB), small, the demand for a quick turnaround from the enclave user, and the limited computing resources today’s SGX provides (about 96 MB physical memory on most commercial hardware (Chakrabarti et al., 2019)). Simply sand-boxing the enclave code significantly increases the size of TCB, rendering it less trustworthy, and also brings in performance overheads incurred by confinement and checkpoint/rollback (Hunt et al., 2018b).

A promising direction we envision that could lead to a practical solution is proof-carry code (PCC), a technique that enables a verification condition generator (VCGen(Colby et al., 2000; Leroy, 2006; Pirzadeh et al., 2010) to analyze a program and create a proof that attests the program’s adherence to policies, and a proof checker to verify the proof and the code. The hope is to push the heavy-lifting part of the program analysis to the VCGen outside the enclave while keeping the proof checker inside the enclave small and efficient. The problem is that this cannot be achieved by existing approaches, which utilize formal verification (such as (Necula and Rahul, 2001; Pirzadeh et al., 2010)) to produce a proof that could be 1000 larger than the original code. Actually, with years of development, today’s formal verification techniques, theorem proving in particular, are still less scalable, unable to handle large code blocks (e.g., over 10000 instructions) when constructing a security proof.

Our solution. In our research, we developed a new technique to instantiate the CAT model on SGX. Our approach, called CAT-SGX, is a PCC-inspired solution, which relies on out-of-enclave targeted instrumentation for lightweight in-enclave information-flow confinement and integrity protection, instead of heavyweight theorem proving. More specifically, CAT-SGX operates an untrusted code producer as a compiler to build the binary code for a data-processing program (called target program) and instrument it with a set of security annotations for enforcing desired policies at runtime, together with a lightweight trusted code consumer running in the bootstrap enclave to statically verify whether the target code indeed carries properly implanted security annotations.

To reduce the TCB and in-enclave computation, CAT-SGX is designed to simplify the verification step by pushing out most computing burden to the code producer running outside the enclave. More specifically, the target binary is expected to be well formatted by the producer, with all its indirect control flows resolved, all possible jump target addresses specified on a list and enforced by security annotations. In this way, the code consumer can check the target binary’s policy compliance through lightweight Recursive Descent Disassembly to inspect its complete control flow (Section 7), so as to ensure the presence of correctly constructed security annotations in front of each critical operation, such as read, store, enclave operations like OCall, and stack management (through a shadow stack). Any failure in such an inspection causes the rejection of the program. Also, since most code instrumentation (for injecting security annotations) is tasked to the producer, the code consumer does not need to make any change to the binary except relocating it inside the enclave. As a result, we only need a vastly simplified disassembler, instead of a full-fledged, complicated binary analysis toolkit, to support categories of security policies, including data leak control, control-transfer management, self-modifying code block and side/covert channel mitigation (Section 4.2). A wider spectrum of policies can also be upheld by an extension of CAT-SGX, as discussed in the paper (Section 7).

We implemented CAT-SGX in our research, building the code producer on top of the LLVM compiler infrastructure and the code consumer based upon the Capstone disassembly framework (cap, [n.d.]) and the core disassembling engine for X86 architecture. Using this unbalanced design, our in-enclave program has only 2000 lines of source code, and together with all the libraries involved, it is compiled into 1.9 MB binary. This is significantly smaller than the NaCl’s core library used by Ryoan, whose binary is around 19 MB and the theorem prover Z3, with 26 MB. We further evaluated our implementation on micro-benchmarks (nBench), as well as macro-benchmarks, including credit scoring, HTTPS server, and also basic biomedical analysis algorithms (sequence alignment, sequence generation, etc.) over various sizes of genomic data (1000 Genomes Project (100, [n.d.])), under the scenario of confidential computing as a service (Section 3.2

). CAT-SGX incurs on average (calculated by geometric mean) 20% performance overhead and less than 30% storage overhead enforcing all the proposed security policies, and leads to around 10% performance overhead and less than 20% storage overhead without side/covert channel mitigation. We have released our code on Github 

(our, [n.d.]).

Contributions. The contributions of the paper are outlined as follows:

Confidential attestation model. We propose CAT, a new model that extends today’s TEE to maintain the data owner’s trust in protection of her enclave data, without exposing the code of the data-processing program. This is achieved through enforcing a set of security policies through a publicly verifiable bootstrap enclave. This new attestation model enables a wide spectrum of applications with great real-world demand in the confidential computing era.

New techniques for supporting CAT on SGX. We present the design for instantiating CAT on SGX, following the idea of PCC. Our approach utilizes out-of-enclave code analysis and instrumentation to minimize the workload for in-enclave policy compliance check, which just involves a quick run of a well-formatted target binary for inspecting correct instrumentation. This simple design offers supports for critical policies, ranging from memory leak prevention to side channel mitigation, through a much smaller TCB compared with sand-box solutions.

Implementation and evaluation. We implemented our design of CAT-SGX and extensively evaluated our prototype on micro- and macro- benchmarks, together with popular biomedical algorithms on human genomic data. Our experiments show that CAT-SGX effectively enforces various security policies at small cost, with the delay incurred by memory leak prevention around 20% and side-channel mitigation usually no more than 35%.

2. Background

Intel SGX. Intel SGX (McKeen et al., 2013) is a user-space TEE, which is characterized by flexible process-level isolation: a program component can get into an enclave mode and be protected by execution isolation, memory encryption and data sealing, against the threats from the untrusted OS and processes running in other enclaves. Such protection, however, comes with in-enclave resource constraints. Particularly, only 128 MB encryption protected memory (called Enclave Page Cache or EPC) is reserved for enclaves for each processor. Although the virtual memory support is available, it incurs significant overheads in paging.

Another problem caused by SGX’s flexibility design is a large attack surface. When an enclave program contains memory vulnerabilities, attacks can happen to compromise enclave’s privacy protection. Prior research demonstrates that a Return-Oriented-Programming (ROP) attack can succeed in injecting malicious code inside an enclave, which can be launched by the OS, Hypervisor, or BIOS (Lee et al., 2017a; Biondo et al., 2018; Schwarz et al., 2019). Another security risk is side-channel leak (Schwarz et al., 2017; Lee et al., 2017b; Gras et al., 2018), caused by the thin software stack inside an enclave (for reducing TCB), which often has to resort to the OS for resource management (e.g., paging, I/O control). Particularly, an OS-level adversary can perform a controlled side channel attack (e.g., (Xu et al., 2015)). Also in the threat model is the physical adversary, such as a system administrator, who tries to gain unauthorized access to a TEE’s computing units to compromise its integrity or confidentiality.

SGX remote attestation. As mentioned earlier, attestation allows a remote user to verify that the enclave is correctly constructed and run on a genuine SGX-enabled platform. In Intel’s attestation model, three parties are involved: (1) the Independent Software Vendor (ISV) who is registered to Intel as the enclave developer; (2) the Intel Attestation Service (IAS) hosted by Intel to help enclave verification, and (3) the SGX-enabled platform that operates SGX enclaves. The attestation begins with the ISV sending an attestation request challenge, which can be generated by an enclave user or a data owner who wants to perform the attestation with the enclave to check its state. Upon recipient of the challenge, the enclave then generates a verification report including the enclave measurement, which can be verified by a quoting enclave (QE) through local attestation. The QE signs the report using the attestation key and the generated quote is forwarded to the Intel Attestation Service (IAS). The IAS then checks the quote and signs the verification result using Intel’s private key. The ISV can then validate the attestation result based upon the signature and the enclave measurement.

PCC. PCC is a software mechanism that allows a host system to verify an application’s properties with a formal proof accompanying the application’s executable code. Using PCC, the host system is expected to quickly check the validity of the proof, and compare the conclusions of the proof to its own security policies to determine whether the application is safe to run. Traditional PCC schemes tend to utilize formal verification for proof generation and validation. Techniques for this purpose includes verification condition generator/proof generator (Homeier and Martin, 1995; Colby et al., 2000), theorem prover/proof assistant (Paulson, 2000; De Moura and Bjørner, 2008; Bertot and Castéran, 2013), and proof checker/verifier (Appel et al., 2003), which typically work on type-safe intermediate languages (IL) or higher level languages. A problem here is that up to our knowledge, no formal tool today can automatically transform a binary to IL for in-enclave verification. BAP (Brumley et al., 2011) disassembles binaries and lifts x86 instructions to a formal format, but it does not have a runtime C/C++ library for static linking, as required for an enclave program.

Moreover, the PCC architecture today relies on the correctness of the VCGen and the proof checker, so a direct application of PCC to confidential computing needs to include both in TCB. This is problematic due to their complicated designs and implementations, which are known to be error-prone (Necula and Rahul, 2001). Particularly, today’s VCGens are built on interpreter/compiler or even virtual machine (Leroy, 2006), and therefore will lead to a huge TCB. Prior attempts (Appel, 2001) to move VCGen out of TCB are found to have serious performance impacts, due to the significantly increased proof size.

Actually, the proof produced by formal verification is typically large, growing exponentially with the size of the program that needs certified (Necula, 1997). It is common to have a proof 1000 times larger than the code (Pirzadeh et al., 2010). Although techniques are there to reduce the proof size (Appel, 2001; Pirzadeh et al., 2010),they are complicated and increase the TCB size (Appel et al., 2003). Therefore as far as we are aware, no existing PCC techniques can be directly applied to enable the CAT model on today’s TEE.

3. Confidential Attestation

Consider an organization that provides data-processing services, such as image editing (Pixlr), tax preparation (TurboTax), personal health analysis (23andMe) and deep learning inference as a service. To use the services, its customers need to upload their sensitive data, such as images, tax documents, and health data, to the hosts operated by the organization. To avoid exposing the data, the services run inside SGX enclaves and need to prove to the customers that they are only accessible to authorized service programs. However, the organization may not want to release the proprietary programs to protect its intellectual property. This problem cannot be addressed by today’s TEE design. In this section, we present the

Confidential ATtestation (CAT) model to allow the data owner to verify that the enclave code satisfies predefined security policy requirements without undermining the privacy of the enclave code.

Figure 1. The CAT model

3.1. The CAT Model

The CAT model can be described by the interactions among 4 parties, as follows:

Attestation service. Attestation service (AS) assists in the remote attestation process by helping the data owner and/or the code provider verify the quote generated by an enclave, as performed by the Intel attestation service for SGX.

Bootstrap enclave. The boostrap enclave is a built-in control layer on the software stack of an enclave supporting CAT (see Figure 1). Its code is public and initial state is measured by hardware for generating an attestation quote, which is later verified by the data owner and the code provider with the help of the AS. This software layer is responsible for establishing security channels with enclave users, authenticating and dynamically loading the binary of the target program from the code provider and data from its owner. Further it verifies the code to ensure its compliance with predefined security policies before bootstrapping the computation. During the computing, it also controls the data entering or exiting the enclave, e.g., through SGX ECalls and OCalls to perform data sanitization.

Data owner

. The data owner uploads sensitive data (e.g., personal images) to use in-enclave services (e.g., an image classifier) and intends to keep her data secret during the computation. To this end, the owner runs a remote attestation with the enclave to verify the code of the bootstrap enclave, and sends in data through a secure channel only when convinced that the enclave is in the right state so expected policy compliance check will be properly performed on the target binary from the code provider. Note that there could be more than one data owner to provide data.

Code provider. The code provider (owner) can be the service provider (Scenario 1 in Section 3.2), and in this case, her target binary (the service code) can be directly handed over to the bootstrap enclave for compliance check. In general, however, the code provider is a different party and may not trust the service provider. So, similar to the data owner, she can also request a remote attestation to verify the bootstrap enclave before delivering her binary to the enclave for a compliance check.

3.2. Application Scenarios

The CAT model can be applied to the following scenarios to protect both data and code privacy in computing.

Figure 2. Scenarios

Scenario 1: Confidential Computing as a Service. We consider confidential computing as a service (CCaaS

) as a privacy extension of today’s online data processing services like machine-learning as a service 

(Russinovich, 2017), as the example presented at the beginning of the section. CCaaS is hosted by the party that operates its own target binary on the data provided by its owner (e.g., an online image classifier to label uploaded user photos). The outcome of the computation will be sent back to the data owner. Here, the target binary cannot be released for verification so needs to go through an in-enclave compliance check.

Scenario 2: Confidential Data as a Service. In this scenario (CDaaS), it is the data owner who hosts the online service. The code provider dispatches her program (the target binary) to analyze the data and get the result back, all through a secure channel. An example is that a pharmaceutical company inspects the electronic medical records on a hospital’s server to seek suitable candidates for a drug trial. Here, the code provider wants to ensure that her algorithm will be properly executed and will not be released, which is done through a remote attestation to verify the bootstrap loader. The data owner also needs to put a policy in place to control the amount of information that can be given to the code provider.

Scenario 3: Confidential Data Computing Market. Another scenario (called CDCM) is that the enclave is hosted by an untrusted third party, a market platform, to enable data sharing and analysis. In this case, both the data owner and the code provider upload to the platform their individual content (data or code) through secure channels. They all go through remote attestations to ensure the correctness of the bootstrap enclave, which could also arrange payment transactions between the data owner and the code provider through a smart contract.

3.3. Requirements for a CAT System

To instantiate the CAT model on a real-world TEE such as SGX, we expect the following requirements to be met by the design:

Minimizing TCB. In the CAT model the bootstrap enclave is responsible for enforcing security and privacy policies and for controlling the interfaces that import and export code/data for the enclave. So it is critical for trust establishment and needs to be kept as compact as possible for code inspection or verification.

Reducing resource consumption. Today’s TEEs operate under resource constraints. Particularly, SGX is characterized by limited EPC. To maintain reasonable performance, we expect that the software stack of the CAT model controls its resource use.

Controlling dynamic code loading. The target binary is dynamically loaded and inspected by the bootstrap enclave. However, the binary may further sideload other code during its runtime. Some TEE hardware, SGX in particular, does not allow dynamic change to enclave page’s RWX properties. So the target binary, itself loaded dynamically, is executed on the enclave’s heap space. Preventing it from sideloading requires a data execution prevention (DEP) scheme to guarantee the W X privilege.

Preventing malicious control flows. Since the target binary is not trusted, the CAT software stack should be designed to prevent the code from escaping policy enforcement by redirecting its control flow or tampering with the bootstrap enclave’s critical data structures. Particularly, previous work shows that special SGX instructions like ENCLU could become unique gadgets for control flow redirecting (Biondo et al., 2018), which therefore need proper protection.

Minimizing performance impact. In all application scenarios, the data owner and the code provider expect a quick turnaround from code verification. Also the target binary’s performance should not be significantly undermined by the runtime compliance check.

3.4. Threat Model

The CAT model is meant to establish trust between the enclave and the code provider, as well as the data owner, under the following assumptions:

We do not trust the target binary (service code) and the platform hosting the enclave. In CCaaS, the platform may deliberately run vulnerable target binary to exfiltrate sensitive data, by exploiting the known vulnerabilities during computation. The binary can also leak the data through a covert channel (e.g., page fault (Xu et al., 2015)).

Under the untrusted service provider, our model does not guarantee the correctness of the computation, since it is not meant to inspect the functionalities of the target binary. Also, although TEE is designed to prevent information leaks to the untrusted OS, denial of service can still happen, which is outside the scope of the model.

We assume that the code of the bootstrap enclave can be inspected to verify its functionalities and correctness. Also we consider the TEE hardware, its attestation protocol, and all underlying cryptographic primitives to be trusted.

Our model is meant to protect data and code against different kinds of information leaks, not only explicit but also implicit. However, side channel for a user-land TEE (like SGX) is known to be hard to eliminate. So our design for instantiating the model on SGX (Section 4.2) can only mitigate some types of side-channel threats.

4. Enhancing SGX with CAT

In this section we present our design, called CAT-SGX, that elevates the SGX platform with the support for the CAT model. This is done using an in-enclave software layer – the bootstrap enclave running the code consumer and an out-enclave auxiliary – the code generator. Following we first describe the general idea behind our design and then elaborate the policies it supports, its individual components and potential extension.

4.1. CAT-SGX: Overview

Idea. Behind the design of CAT-SGX is the idea of PCC, which enables efficient in-enclave verification of the target binary’s policy compliance on the proof generated for the code. A direct application of the existing PCC techniques, however, fails to serve our purpose, as mentioned earlier, due to the huge TCB introduced, the large proof size and the exponential time with regards to the code size for proof generation. To address these issues, we design a lightweight PCC-type approach with an untrusted code producer and a trusted code consumer running inside the bootstrap enclave. The producer compiles the source code of the target program (for service providing), generates a list of its indirect jump targets, and instruments it with security annotations for runtime mediation of its control flow and key operations, in compliance with security policies. The list and security annotations constitute a “proof”, which is verified by the consumer after loading the code into the enclave and before the target binary is activated.

Figure 3. System overview

Architecture. The architecture of CAT-SGX is illustrated in Figure 3. The code generator and the binary and proof it produced are all considered untrusted. Only in the TCB is the code consumer with two components: a dynamic-loader operating a rewriter for re-locating the target binary, and a proof verifier running a disassembler for checking the correct instrumentation of security annotations. These components are all made public and can therefore be measured for a remote attestation (Section 7). They are designed to minimize their code size, by moving most workload to the code producer.

Figure 4. Detailed framework and workflow

We present the workflow of CAT-SGX in Figure 4. The target program (the service code) is first instrumented by the code producer, which runs a customized LLVM-based compiler (step 1). Then the target binary with the proof (security annotations and the jump target list) are delivered to the enclave through a secure channel. The code is first parsed (step 2) and then disassembled from the binary’s entry along with its control flow traces. After that, the proof with the assembly inspected by the verifier and if correct (step 3) before some immdiates being rewriten (step 4), is further relocated and activated by the dynamic loader (Step 5). Finally, after the bootstrap transfers the execution to the target program, the service begins and policies are checked at runtime.

4.2. Security Policies

Without exposing its code for verification, the target binary needs to be inspected for compliance with security policies by the bootstrap enclave. These policies are meant to protect the privacy of sensitive data, to prevent its unauthorized disclosure. The current design of CAT-SGX supports the policies in the following five categories:

Enclave entry and exit control. CAT-SGX can mediate the content imported to or exported from the enclave, through the ECall and OCall interfaces, for the purposes of reducing the attack surface and controlling information leaks.

P0: Input constraint, output encryption and entropy control. We restrict the ECall interfaces to just serving the purposes of uploading data and code, which perform authentication, decryption and optionally input sanitization (or a simple length check). Also only some types of system calls are allowed through OCalls. Particularly, all network communication through OCalls should be encrypted with proper session keys (with the data owner or the code provider). For CCaaS, the data owner can demand that only one OCall (for sending back results to the owner) be allowed. For CDaaS, the data owner can further impose the constraint on the amount of information (number of bits) that can be returned to the code provider: e.g., one bit to indicate whether suitable patients for a drug trial exist or one byte to tell the number.

Memory leak control. Information leak can happen through unauthorized write to the memory outside the enclave, which should be prohibited through the code inspection.

P1: Preventing explicit out-enclave memory stores. This policy prevents the target binary from writing outside the enclave, which could be used to expose sensitive data. It can be enforced by security annotations through mediation on the destination addresses of memory store instructions (such as MOV) to ensure that they are within the enclave address range ELRANGE).

P2: Preventing implicit out-enclave memory stores. Illicit RSP register save/spill operations can also leak sensitive information to the out-enclave memory by pushing a register value to the address specified by the stack pointer, which is prohibited through inspecting the RSP content.

P3: Preventing unauthorized change to security-critical data within the bootstrap enclave. This policy ensures that the security-critical data would never be tampered with by the untrusted code.

P4: Preventing runtime code modification. Since the target code is untrusted and loaded into the enclave during its operation, under SGXv1, the code can only be relocated to the pages with RWX properties. So software-based DEP protection should be in place to prevent the target binary from changing itself or uploading other code at runtime.

Control-flow management. To ensure that security annotations and other protection cannot be circumvented at runtime, the control flow of the target binary should not be manipulated. For this purpose, the following policy should be enforced:

P5: Preventing manipulation of indirect branches to violate policies P1 to P4. This policy is to protect the integrity of the target binary’s control flow, so security annotations cannot be bypassed. To this end, we need to mediate all indirect control transfer instructions, including indirect calls and jumps, and return instructions.

AEX based side/covert channel mitigation. SGX’s user-land TEE design exposes a large side-channel surface, which cannot be easily eliminated. In the meantime, prior research shows that many side-channel attacks cause Asynchronous Enclave Exits (AEXs). Examples include the controlled side channel attack (Xu et al., 2015) that relies on triggering page faults, and the attacks on L1/L2 caches (Wang et al., 2017), which requires context switches to schedule between the attack thread and the enclave thread, when Hyper-threading is turned off or a co-location test is performed before running the binary (Chen et al., 2018). CAT-SGX is capable of integrating existing solutions to mitigate the side- or covert-channel attacks in this category.

P6: Controlling the AEX frequency. The policy requires the total number of the AEX concurrences to keep below a threshold during the whole computation. Once the AEX is found to be too frequent, above the threshold, the execution is terminated to prevent further information leak.

4.3. Policy-Compliant Code Generation

As mentioned earlier, the design of CAT-SGX is to move the workload from in-enclave verification to out-enclave generation of policy-compliant binary and its proof (security annotations and the list of indirect jump targets). In this section we describe the design of the code generator, particularly how it analyzes and instruments the target program so that security policies (P1~P6, see Section 4.2) can be enforced during the program’s runtime. Customized policies for purposes other than privacy can also be translated into proof and be enforced flexibly.

Enforcing P1. The code generator is built on top of the LLVM compiler framework (Section 5.1). When compiling the target program (in C) into binary, the code generator identifies (through the LLVM API MachineInstr::mayStore()) all memory storing operation instructions (e.g., MOV, Scale-Index-Base (SIB) instructions) and further inserts annotation code before each instruction to check its destination address and ensure that it does not write outside the enclave at runtime. The boundaries of the enclave address space can be obtained during dynamic code loading, which is provided by the loader (Section 4.4). The correct instrumentation of the annotation is later verified by the code consumer inside the enclave.

Enforcing P2. The generator locates all instructions that explicitly modify the stack pointer (the RSP in X86 arch) from the binary (e.g., a MOV changing its content) and inserts annotations to check the validity of the stack pointer after them. This protection, including the content of the annotations and their placement, is verified by the code consummer (Section 4.4). Note that RSP can also be changed implicitly, e.g., through pushing oversized objects onto the stack. This violation is prevented by the loader (Section 4.4), which adds guard pages (pages without permission) around the stack.

Enforcing P3. Similar to the enforcement of P1 and P2, the code generator inserts security annotations to prevent (both explicit and implicit) memory write operations on security-critical enclave data (e.g., SSA/TLS/TCS) once the untrusted code is loaded and verified. These annotation instructions are verified later by the verifier.

Enforcing P4. To prevent the target binary from changing its own code at runtime, the code generator instruments all its write operations (as identified by the APIs readsWritesVirtualRegister() and mayStore()) with the annotations that disallow alternation of code pages. Note that the code of the target binary has to be placed on RWX pages by the loader under SGXv1 and its stack and heap are assigned to RW pages (see Sec. 4.4), so runtime code modification cannot be stopped solely by page-level protection (though code execution from the data region is defeated by the page permissions).

Enforcing P5. To control indirect calls or indirect jumps in the target program, the code generator extracts all labels from its binary during compilation and instruments security annotations before related instructions to ensure that only these labels can serve as legitimate jump targets. The locations of these labels should not allow an instrumented security annotations to be bypassed. Also to prevent the backward-edge control flow manipulation (through RET), the generator injects annotations after entry into and before return from every function call to operate on a shadow stack (see Figure 14), which is allocated during code loading. Also all the legitimate labels are replaced by the loader when relocating the target binary. Such annotations are then inspected by the verifier when disassembling the binary to ensure that protection will not be circumvented by control-flow manipulation (Section 4.4).

Enforcing P6 with SSA inspection. When an exception or interrupt take place during enclave execution, an AEX is triggered by the hardware to save the enclave context (such as general registers) to the state saving area (SSA). This makes the occurrence of the AEX visible (Gruss et al., 2017; Chen et al., 2018). Specifically, the code generator can enforce the side-channel mitigation policy by instrumenting every basic block with an annotation that sets a marker in the SSA and monitors whether the marker is overwritten, which happens when the enclave context in the area has been changed, indicating that an AEX has occurred. Through counting the number of consecutive AEXes, the protected target binary can be aborted in the presence of anomalously frequent interrupts. This protection can also be verified by the code consumer before the binary is allowed to run inside the enclave.

Code loading support. Loading the binary is a procedure that links the binary to external libraries and relocates the code. For a self-contained function (i.e., one does not use external elements), compiling and sending the bytes of the assembled code is enough. However, if the function wants to use external elements but not supported inside an enclave (e.g., a system call), a distributed code loading support mechanism is needed. In our design, the loading procedure is divided into two parts, one (linking) outside and the other (relocation) inside the enclave.

Our code generator assembles all the symbols of the entire code (including necessary libraries and dependencies) into one relocatable file via static linking. While linking all object files generated by the LLVM, it keeps all symbols and relocation information held in relocatable entries. The relocatable file, as above-mentioned target binary, is expected to be loaded for being relocated later (Section 4.4).

4.4. Configuration, Loading and Verification

With the annotations instrumented and legitimate jump targets identified, the in-enclave workload undertaken by the bootstrap enclave side has been significantly reduced. Still, it needs to be properly configured to enforce the policy (P0) that cannot be implemented by the code generator, load and relocate the target binary so instrumented protection can be properly executed and also verify the “proof” for policy compliance through efficient dissembling and inspecting the binary. Following we elaborate how these critical operations are supported by our design.

Enclave configuration to enforce P0. To enforce the input constraint, we need to configure the enclave by defining certain public ECalls in Enclave Definition Language (EDL) files for data and code secure delivery. Note such a configuration, together with other security settings, can be attested to the remote data owner or code provider. The computation result of the in-enclave service is encrypted using a session key (with the data owner or code provider) after the remote attestation and is sent out through a customized OCall. For this purpose, CAT-SGX only defines allowed system calls (e.g., send/recv) in the EDL file, together with their wrappers for security control. Specially, the wrapper for send

encrypts the message to be delivered and pads it to a fixed length.

To support the CCaaS setting, only send and recv are allowed to communicate with the data owner. When necessary, the wrappers of these functions can pad the encrypted output and ensure that the inter-packet timings are constant to mitigate the side-channel risk. For CDaaS, we only permit a send OCall to be invoked once to deliver the computing result to the code provider, which can be enforced by the wrapper of the function through a counter. Further the wrapper can put a constraint on the length of the result to control the amount of information disclosed to the code provider: e.g., only 8 bits can be sent out.

Dynamic code loading and unloading. The target binary is delivered into the enclave as data through an ECall, processed by the wrapper placed by CAT-SGX, which authenticates the sender and then decrypts the code before handing it over to the dynamic loader. The primary task of the loader is to rebase all symbols of the binary according to its relocation information (Section 4.3). For this purpose, the loader first parses the binary to retrieve its relocation tables, then updates symbol offsets, and further reloads the symbols to designated addresses. During this loading procedure, the indirect branch label list is “translated” to in-enclave addresses, which are considered to be legitimate branch targets and later used for policy compliance verification.

As mentioned earlier (Section 4.3), the code section of the target binary is placed on pages with RWX privileges, since under SGXv1, the page permissions cannot be changed during an enclave’s operation, while the data sessions (stack, heap) are assigned to the pages with RW privileges. These code pages for the binary are guarded against any write operation by the annotations for enforcing P4. Other enclave code, including that of the code consumer, is under the RX protection through enclave configuration. Further the loader assigns two non-writable blank guard pages right before and after the target binary’s stack for enforcing P2, and also reserves pages for hosting the list of legitimate branch targets and the shadow stack for enforcing P5.

Just-enough disassembling and verification. After loading and relocating, the target binary is passed to the verifier for a policy compliance check. Such a verification is meant to be highly efficient, together with a lightweight disassembler. Specifically, our disassembler is designed to leverage the assistance provided by the code generator. It starts from the program entry discovered by the parser and follows its control flow until an indirect control flow transfer, such as indirect jump or call, is encountered. Then, it utilizes all the legitimate target addresses on the list to continue the disassembly and control-flow inspection. In this way, the whole program will be quickly and comprehensively examined.

For each indirect branch, the verifier checks the annotation code (Figure 4.3) right before the branch operation, which ensures that the target is always on the list at runtime. Also, these target addresses, together with direct branch targets, are compared with all guarded operations in the code to detect any attempt to evade security annotations. With such verification, we will have the confidence that no hidden control transfers will be performed by the binary, allowing further inspection of other instrumented annotations. These annotations are expected to be well formatted and located around the critical operations as described in Section 4.3. Figure 6 presents an example and more details are given in Section 5.1 and Appendix.

5. Implementation

We implemented the prototype on Linux/X86 arch. Specifically, we implemented the code generator with LLVM 9.0.0, and built other parts on an SGX environment.

We implemented one LLVM back-end pass consisting of several types of instrumentations for the code generator, about 1200 lines of C++ code in total. Besides, we implemented the bootstrap enclave with over 1900 lines of code based on Capstone (cap, [n.d.]) as the disassembler.

5.1. Assembly-level Instrumentation

Figure 5. Detailed workflow of the code generator

The code generator we built is mainly based on LLVM (Fig. 5), and the assembly-level instrumentation is the core module. To address the challenge of limited computing resources described in Section 3.3, this code generator tool is designed and implemented comprehensively, to make the policy verifier small and exquisite. More specifically, we implemented modules for checking memory writing instructions, RSP modification, indirect branches and for building shadow stack. And we reformed a instrumentation module to generate side-channel-resilient annotations. Note that we can not only demonstrate the security policies for several real-world scenarios can be efficiently enforced with our framework, modules of the annotation generation for customized functionalities can also be integrated into the code generator. For convenience, switches to turn on/off these modules are made.

Here is an example. The main function of the module for checking explicit memory write instructions (P1) is to insert annotations before them. Suppose there is such a memory write instruction in the target program, ‘mov reg, [reg+imm]’, the structured annotation first sets the upper and lower bounds as two temporary Imms (3ffffffffffff and 4ffffffffffff), and then compares the address of the destination operand with the bounds. The real upper/lower bounds of the memory write instruction are specified by the loader later. If our instrumentation finds the memory write instruction trying to write data to illegal space, it will cause the program to exit at runtime. The code snippet (structured format of the annotation) is shown in Figure 6. More details can be found at Appendix .1.

1pushq   %rbx    ;save execution status 2pushq   %rax 3leaq    [reg+imm], %rax ;load the operand 4movq    $0x3FFFFFFFFFFFFFFF, %rbx  ;set bounds 5cmpq    %rbx, %rax 6ja      exit_label 7movq    $0x4FFFFFFFFFFFFFFF, %rbx  ;set bounds 8cmpq    %rbx, %rax 9jb      exit_label 10popq    %rax 11popq    %rbx
Figure 6. Store instruction instrumentation

Although using the code generator we could automatically produce an instrumented object file, we still need to deal with some issues manually that may affect practical usage. As the workflow described in Figure 4, the first job to make use of CAT system is preparing the target binary. Service-specific libraries and some dependencies also should be built and linked against the target program (detailed in Appendix .2).

5.2. Building Bootstrap Enclave

Following the design in Section 4.4, we implemented a Dynamic Loading after RA mechanism for the bootstrap enclave. During the whole service, the data owner can only see the attestation messages which are related with the bootstrap’s enclave quote, but nothing about service provider’s code.

Figure 7. Detailed workflow of the dynamic loader

Remote attestation. Once the bootstrap enclave is initiated, it needs to be attested. We leverage the original RA routine (ori, [n.d.]) and adjust it to our design. The original RA routine requires that the host, which is assumed to run the enclave as the ‘client’, initiates the attestation towards the ‘server’, who owns the data. While in this CCaaS scenario, the service runs in the enclave while the remote user owns the data. So, we modify this routine to enable a remote CCaaS user to initiate the attestation.

The RA procedures can be invoked by calling sgx_ra_init() inside the service provider’s enclave after secret provision between the remote user and the service provider. After obtaining an enclave quote of the bootstrap enclave which is signed with the platform’s EPID key, the remote data owner can submit the quote to IAS and obtain an attestation report.

Dynamic loader. When the RA is finished, the trust between data owner and the bootstrap enclave is established. Then the user can locally/remotely call the Ecall (ecall_receive_binary) to load the service binary instrumented with security annotations and the indirect branch list without knowing the code. User data is loaded from untrusted memory into the trusted enclave memory when the user remotely calls Ecall (ecall_receive_userdata), to copy the data to the section reserved for it.

Then, the dynamic loader in the bootstrap enclave loads and relocates the generated code. The indirect branch list, which is comprised of symbol names that will be checked in indirect branch instrumentations, will be resolved at the very beginning. In our implementation, there are both 4M memory space for storing indirect branch targets, as well as for shadow stack. And we reserve 64M memory space for received binary and for ‘.data’ section. The heap base address is slightly larger than the end of received binary, and 0x27000 Bytes (156 KB) space is reserved for the loader’s own heap. After relocation, the detailed memory layout and some key steps are shown in Figure 7.

Policy verifier. The policy-compliance verifier, is composed with three components - a clipped disassembler, a verifier, and a immediate operand rewritter.

Clipped disassembler. We enforce each policy mostly at assembly level. Thus, we incorporate a lightweight disassembler inside the enclave. To implement the disassembler, we remove unused components of this existing wide-used framework, and use Recursive Descent Disassembly to traverse the code. We used the diet mode, in which some non-critical data are removed, thus making the engine size at least 40% smaller (Quynh, 2014).

Policy verifier. The verifier and the following rewriter do the work just right after the target binary is disassembled, according the structured guard formats provided by our code generator. The verifier uses a simple scanning algorithm to ensure the policies applied in assembly language instrumentation. Specifically, the verifier scans the whole assembly recursively along with the disassembler. It follows the clipped disassembler to scan instrumentations before/after certain instructions are in place, and checks if there is any branch target pointing between instructions in those instrumentations.

Imm rewriter. One last but not least step before executing the target binary code is to resolve and replace the Imm operands in instrumentations, including the base of the shadow stack, and the addresses of indirect branch targets (i.e. legal jump addresses). For example, the genuine base address of shadow stack is the start address __ss_start of the memory space reserved by the bootstrap enclave for the shadow stack. And the ranges are determined using functions of Intel SGX SDK during dynamic loading (Section 4.4).

We use the simplest way to rewrite Imm operands. Table 1 shows what the specific values should be before and after rewriting, respectively. The first column of table 1 shows the target we need to rewrite while loading. For instance, the upper bound address of data section would be decided during loading, but it would be 3ffffffffffffffff (shown in the 2nd. column) during the proof generation and will be modified to the real upper data bound address. The third column shows the variable name used in our prototype.

Target imm description From To
Upper bound of data section 3ffffffffffffffff upper_data_bound
Lower bound of data section 4ffffffffffffffff lower_data_bound
Upper bound of stack 5ffffffffffffffff upper_stack_bound
Lower bound of stack 6ffffffffffffffff lower_stack_bound
Upper bound of code section 7ffffffffffffffff lower_code_bound
Lower bound of code section 8ffffffffffffffff lower_code_bound
# of indirect branch targets 1ffffffff branch_target_idx
Addr. of branch target list 1ffffffffffffffff __branch_target
Addr. of the shadow stack 2ffffffffffffffff __ss_start
Table 1. Fields to be rewritten

6. Evaluation

In this section we report our security analysis and performance evaluation of CAT-SGX.

6.1. Security Analysis

TCB analysis. The hardware TCB of CAT-SGX includes the TEE-enabled platform, i.e. the SGX hardware. The software TCB includes the following components to build the bootstrap enclave.

Loader and verifier. The loader we implemented consists of less than 600 lines of code (LoCs) and the verifier includes less than 700 LoCs, which also integrates SGX SDK and part of Capstone libraries.

ECall/OCall stubs for supporting P0. This was implemented in less than 500 LoCs.

Simple RA protocol realization. The implementation (Section 7) introduces about 200 LoCs.

Altogether, our software TCB contains less than 2000 LoCs and some dependencies, which was compiled into a self-contained binary with 1.9 MB in total.

Policy analysis. Here we show how the policies on the untrusted code, once enforced, prevent information leaks from the enclaves. In addition to side channels, there are two possible ways for a data operation to go across the enclave boundaries: bridge functions (Van Bulck et al., 2019) and memory write.

Bridge functions. With the enforcement of P0, the loaded code can only invoke our OCall stubs, which prevents the leak of plaintext data through encryption and controls the amount of information that can be sent out (to the code provider in CDaaS).

Memory write operations. All memory writes, both direct memory store and indirect register spill, are detected and blocked. Additionally, software DEP is deployed so the code cannot change itself. Also the control-flow integrity (CFI) policy, P5, prevents the attacker from bypassing the checker with carefully constructed gadgets by limiting the control flow to only legitimate target addresses.

As such, possible ways of information leak to the outside of the enclave are controlled. As proved by previous works (Sinha et al., 2015; Sinha et al., 2016) the above-mentioned policies (P1~P5) guarantee the property of confidentiality. Furthermore the policy (P5) of protecting return addresses and indirect control flow transfer, together with preventing writes to outside has been proved to be adequate to construct the confinement (Schuster et al., 2015; Sinha et al., 2016). So, enforcement of the whole set of policies from P0 to P5 is sound and complete in preventing explicit information leaks. In the meantime, our current design is limited in side-channel protection. We can mitigate the threats of page-fault based attacks and exploits on L1/L2 cache once Hyper-threading is turned off or HyperRace (Chen et al., 2018) is incorporated (P6). However, defeating the attacks without triggering interrupts, such as inference through LLC is left for future research.

6.2. Performance Evaluation

Application Name Size Size (P1~P5) Size (P1~P6) Execution Time Execution Time (P1~P5) Execution Time (P1~P6)
bm_clock 209KB 217KB (+3.83%) 218KB (+4.31%) 1.271s 1.457s (+14.6%) 1.469s (+15.8%)
bm_malloc_and_magic 227KB 237KB (+4.41%) 239KB (+5.29%) 1.343s 1.537s (+14.4%) 1.638s (+22.0%)
bm_malloc_memalign 229KB 240KB (+4.80%) 242KB (+5.68%) 1.278s 1.467s (+14.8%) 1.567s (+22.6%)
bm_malloc_and_sort 208KB 222KB (+6.73%) 225KB (+8.17%) 1.270s 1.473s (+16.0%) 1.620s (+27.6%)
bm_memcpy 4.7KB 7.9KB (+68.1%) 11KB (+134%) 1.211s 1.247s (+2.97%) 1.396s (+15.3%)
bm_memchr 5.2KB 8.3KB (+59.6%) 11KB (+116%) 1.210s 1.251s (+3.39%) 1.391s (+15.0%)
bm_sprintf 70KB 74KB (+5.71%) 76KB (+8.57%) 1.218s 1.299s (+6.65%) 1.440s (+18.2%)
bm_sort_and_binsearch 89KB 98KB (+10.1%) 102KB (+14.6%) 1.234s 1.314s (+6.48%) 1.460s (+18.3%)
Table 2. Binary code size and execution time of simple applications
Program Name Baseline P1~P5 P1~P6
NUMERIC SORT 1487s 1588s (+6.79%) 1665s (+12.0%)
STRING SORT 8460s 9507s (+12.4%) 10.02ms (+18.4%)
BITFIELD 46.83ns 54.10ns (+15.5%) 55.23ns (+17.9%)
FP EMULATION 14.93ms 14.98ms (+0.33%) 15.73ms (+5.36%)
FOURIER 34.22s 35.21s (+2.89%) 36.77s (+7.45%)
ASSIGNMENT 43.52ms 54.41ms (+25.0%) 60.85ms (+39.8%)
IDEA 342.2s 352.9s (+3.13%) 385.7s (+12.1%)
HUFFMAN 550.1s 649.6s (+18.1%) 667.1s (+21.3%)
NEURAL NET 49.44ms 59.43ms (+20.2%) 60.84ms (+23.1%)
LU DECOMPOSITION 1024s 1123s (+9.67%) 1255s (+22.6%)
Table 3. Performance evaluation on nBench

Testbed setup. In our research, we evaluated the performance of our prototype and tested its code generation and code execution. All experiments were conducted on Ubuntu 18.04 (Linux kernel version 4.4.0) with SGX SDK 2.5 installed on Intel Xeon CPU E3-1280 with 64GB memory. Also we utilized GCC 5.4.0 to build the bootstrap enclave and the SGX application, and the parameters ‘-fPIC’, ‘-fno-asynchronous-unwind-tables’, ‘-fno-addrsig’, and ‘-mstackrealign’ to generate X86 binaries.

Performance on simple applications. We used the applications provided by the SGX-Shield project (Seo et al., 2017) as a micro-benchmark. In our experiment, we ran each test case for 10 times, measured the resource consumption in each case and reported the median value. Specifically, we first set the baseline as the performance of an uninstrumented program running through a pure loader (a loader that only does the dynamic loading but no policy-compliance checking). The we compared the baseline with the performance of instrumented programs to measure the overheads. Also the compilation time of each micro-benchmark varies from several seconds to tens of seconds, which is negligible compared with conventional PCC methods (2~5(Necula and Rahul, 2001).

Table 2 illustrates overheads of our approach. From the table, we can see that the size of instrumented binaries (aka. the “code + proof”) is 18.1% larger than the original code and their executions were delayed by 9.8% on average when only P1~P5 are enforced. It becomes 130% in memory and 119% in time when all policies, including P6, are enforced. Note that this batch of benchmarks are mostly a ‘first-simple-processing-then-syscall’ program. At the worst case - ‘bm_malloc_and_sort’, CAT-SGX showed 27.6% overhead in execution time.

Performance on nBench. We instrumented all applications in the SGX-nBench (sgx, [n.d.]), and ran each testcase of the nBench suites under a few settings, each for 10 times. These settings include just explicit memory write check (P1), both explicit memory write check and implicit stack write check (P1+P2), all memory write and indirect branch check (P1~P5), and together with side channel mitigation (P1~P6).

Table 3 shows the average execution time under different settings. Without side channel mitigation (P1~P5), CAT-SGX introduces an 0.3% to 25% overhead (on FP-emulation). Apparently, the store instruction instrumentation alone (P1) does not cause a large performance overhead, with largest being 6.7%. Also, when P1 and P2 are applied together, the overhead just becomes slightly higher than P1 is enforced alone. Besides, almost all benchmarks in nBench perform well under the CFI check P5 (less than 3%) except for the benchmarks Bitfield (whose overhead is about 4%) and the Assignment (about 10% due to its frequent memory access pattern).

Performance on real-world applications. We further evaluated our prototype on various real-world applications, including personal health data analysis, personal financial data analysis, and Web servers. We implemented those macro-benchmarks and measured the differences between their baseline performance (without instrumentation) in enclave and the performance of our prototype.

Sensitive health data analysis. We studied the following two applications:

1) Sequence Alignment. We implemented the Needleman–Wunsch algorithm (Needleman and Wunsch, 1970) that aligns two human genomic sequences in the FASTA format (fas, [n.d.]b) taken from the 1000 Genomes project (100, [n.d.]). The algorithm uses dynamic programming to compute recursively a two dimensional matrix of similarity scores between subsequences; as a result, it takes memory space where is the length of the two input sequences.

Again, we measured the program execution time under the aforementioned settings. Figure 8 shows the performance of the algorithm with different input lengths (x-axis). The overall overhead (including all kinds of instrumentations) is no more than 20% (with the P1 alone no more than 10%), when input size is small (less than 200 Bytes). When input size is greater than 500 Bytes, the overhead of P1+P2 is about 19.7% while P1~P5 spends 22.2% more time than the baseline.

Figure 8. Performance on sequence alignment

Figure 9. Performance on sequence generation

2) Sequence Generation. We re-implemented the FASTA benchmark (fas, [n.d.]a), which is designed to simulate DNA sequences based on pre-defined nucleotide frequencies. The program can output the nucleotide sequences of length 2, 3 and 5, where is used to measure the output size. Figure 9 shows the performance when the output size (x-axis) varies from 1K to 500K nucleotides. Enforcing P1 alone results in 5.1% and 6.9% overheads when 1K and 100K are set as the output lengths. When the output size is 200K, our prototype yields less than 20% overhead. Even when the side channel mitigation is applied, the overhead becomes just 25%. With the increase of processing data size, the overhead of the system also escalates; however, the overall performance remains acceptable.

Personal credit score analysis

. We further looked into another realistic and more complicated workload. Credit scoring is a typical application that needs to be protected more carefully - both the credit card holder’s cost calendar and card issuer’s scoring algorithm need to be kept secret. In our study, we implemented a BP neural network-based credit scoring algorithm 

(Jensen, 1992)

that calculates user’s credit scores. The input file contains users’ history records and the output is a prediction whether the bank should approve the next transaction. The model was trained on 10000 records and then used to make prediction (i.e., output a confidence probability) on different test cases.

Figure 10. Performance on credit scoring

As shown in Figure 10, on 1000 and 10000 records, enforcement of P1~P5 would yields around 15% overhead. While processing more than 50000 records, the overhead of the full check does not exceed 20%. The overhead of P1~P6 does not exceed 10% when processing 100K records.

Figure 11. Performance on HTTPS server

HTTPS server. We also built an HTTPS server to run in enclave using the mbed TLS library (mbe, [n.d.]). Our protection only allows two system calls (send/recv) to be executed via the OCall stubs for client/server communication. A client executes a stress test tool - Siege (sie, [n.d.]) - on another host in an isolated LAN. Siege was configured to send continuous HTTPS requests (with no delay between two consecutive ones) to the web server for 10 minutes. We measured its performance in the presence of different concurrent connections to understand how our instrumented HTTPS server implementation would perform.

Figure 11 shows the response times and throughput when all policies are applied to the HTTPS server benchmark. When the concurrent connections are less than 75, the instrumented HTTPS server has similar performance of the in-enclave https server without instrumentation. When the concurrency increases to 100, the performance goes down to some extent. While after the concurrency increases to 150, the response time of instrumented server goes up significantly. On average, enforcing P1~P6 results in 14.1% overhead in the response time. As for throughput, when the number of the concurrent connections is between 75 and 200, the overhead is less than 10%. These experiments on realistic workloads show that all policies, including side-channel mitigation, can be enforced at only reasonable cost.

7. Discussion

In previous sections we have shown that the design of CAT offers lightweight and efficient in-enclave verification of privacy policy compliance. Here we discuss some extensions.

Supporting other side/covert channel defenses. In Section 4.3, we talked about policy enforcement approaches for side channel resilience. It demonstrated that our framework can take various side channel mitigation approaches to generate code carried with proof. Besides AEX based mitigations which we learnt from Hyperrace (Chen et al., 2018), others (Doychev et al., 2015; Almeida et al., 2016; Shih et al., 2017; Gruss et al., 2017; Wu et al., 2018; Wang et al., 2019; Orenbach et al., 2019) can also be transformed and incorporated into the CAT design. Even though new attacks have been kept being proposed and there is perhaps no definitive and practical solutions to all side/covert channel attacks, we believe eventually some efforts can be integrated in our work.

Supporting SGXv2. Our approach currently relies on SGXv1 instructions that prevents dynamically changing page permissions using a software DEP. The design could be simplified with SGXv2 instructions (McKeen et al., 2016) since dynamic memory management inside an enclave is allowed and protected in SGXv2 hardware. However, Intel has not shipped SGXv2 CPUs widely. So we implement the CAT model on SGXv1 to maximize its compatibility.

Supporting multi-user. Currently we only support single user scenarios. Of course for multi-user scenarios, we can easily add a data cleansing policy which ensures that once the task for one data owner ends, all her data will be removed from the enclave before the next owner’s data is loaded, together with the content of SSA and registers, while not destroying the bootstrap enclave after use. Further, to fully support multi-user in-enclave services, we need to ensure each user’s session key remains secret and conduct remote attestation for every user when they switch. Hardware features like Intel MPX (Shen et al., 2018) can be applied to enforce memory permission integrity (Zhao et al., 2020), as a supplementary boundary checking mechanism.

Supporting multi-threading. When taking multi-threading into account, the proof generation process become more complicated and cumbersome (Guo et al., 2007). Furthermore, multi-threading would introduce serious bugs (Weichbrodt et al., 2016). However, auditing memory read operations from other threads seems taking the multi-threading leakage once and for all. Actually, if we don’t prevent attacks mentioned in CONFirm (Xu et al., 2019), the proof enforcement of CFI is still broken due to a time of check to time of use (TOCTTOU) problem. To cope with that, we can make all CFI metadata to be read from a register instead of the stack, and guarantee that the instrumented proof could not be written by any threads (Burow et al., 2019).

Supporting on-demand policies. The framework of our system is highly flexible, which means assembling new policies into current design can be very straightforward. Different on-demand policies can be appended/withdrawn to serve various goals. For example, we can attach additional instrumentation to the code and the policy enforcement to the in-enclave verifier in case of the discovery of new side/covert channels and newly-published security flaws. CAT can make the quick patch possible on software level, just like the way people coping with 1-day vulnerabilities - emergency quick fix. Besides, users can also customize the policy according to their need, e.g., to verify code logic and its functionalities.

8. Related Work

Secure computing using SGX. Many existing works propose using SGX to secure cloud computing systems, e.g., VC3 (Schuster et al., 2015), TVM (Hynes et al., 2018), by using sand-boxing (Hunt et al., 2018b), containers (Arnautov et al., 2016), and library OSes (Tsai et al., 2017; Shen et al., 2020). These systems relies on remote attestation to verify the platform and the enclave code, as a result, they either do not protect the code privacy or they consider a one-party scenario, i.e., the code and data needed for the computation are from the same participant. In contrast, we consider 3 real world scenarios (CCaaS, CDaaS and CDCM) protecting code and data from multiple distrustful parties.

Data confinement with SFI. Most related to our work are data confinement technologies, which confines untrusted code with confidentiality and integrity guarantees. Ryoan (Hunt et al., 2018b) and its follow-up work (Hunt et al., 2018a) provide an SFI-based distributed sand-box by porting NaCl to the enclave environment, confining untrusted data-processing modules to prevent leakage of the user’s input data. However the overhead of Ryoan turns out huge (e.g., 100% on genes data) and was evaluated on an software emulator for supporting SGXv2 instructions. XFI (Erlingsson et al., 2006) is the most representative unconventional PCC work based on SFI, which places a verifier at OS level, instead of a lightweight TEE. Occlum (Shen et al., 2020) is a design of SGX-based library OS, enforcing in-enclave task isolation with MPX-based multi-domain SFI scheme. As the goal of SFI scheme is not to prevent information leakage from untrusted code, none of them employs protections against side channel leakages.

Code privacy. Code secrecy is is an easy to be ignored but very important issue (Mazmudar, 2019; Küçük et al., 2019). DynSGX (Silva et al., 2017) and SGXElide (Bauman et al., 2018) both make possible that developers execute their code privately in public cloud environments, enabling developers to better manage the scarce memory resources. However, they only care about the developer’s privacy but ignore the confidentiality of data belonging to users.

Confidentiality verification of enclave programs. With formal verification tools, Moat (Sinha et al., 2015) and its follow-up works (Sinha et al., 2016) can verify if an enclave program has the risk of data leakage. The major focus of them is to verify the confidentiality of an SGX application outside the enclave formally and independently. Although it is possible that the verification could be performed within a “bootstrap enclave”, the TCB would include the IR level language (BoogiePL) interpreter (Barnett et al., 2005) and a theorem prover (De Moura and Bjørner, 2008). Moreover, neither of them can discharge the large overhead introduced by instruction modeling and assertion proving when large-scale real-world programs are verified.

Side channel attacks and defenses. Side channels pose serious threats to secure computing using SGX as attackers can use them to circumvent explicit security defenses implemented by SGX. A rich literature has focused on discovering SGX side channels (Lee et al., 2017b; Wang et al., 2017; Van Bulck et al., 2018; Chen et al., 2019) and their defenses (Shinde et al., 2016; Shih et al., 2017; Oleksenko et al., 2018; Sinha et al., 2017; Chen et al., 2018). Existing SGX secure computing work often assumes side channels as an orthogonal research topic (Sinha et al., 2015; Subramanyan et al., 2017; Shen et al., 2020). Our framework is designed with side channels in mind and we have shown that it can flexibly support integration of instrumentation based side channel defenses.

9. Conclusion

In this paper we proposed the CAT, a remote attestation model that allows the user to verify the code and data provided by untrusted parties without undermining their privacy and integrity. Meanwhile, we instantiated the design of a code generator and a code consumer (the bootstrap enclave) - a lightweight PCC-type verification framework. Due to the differences between normal binary and SGX binary, we retrofit the PCC-type framework to be fitted into SGX. In return, we reduce the framework’s TCB as small as possible. Our work does not use formal certificate to validate the loaded private binary, but leverage data/control flow analysis to fulfill the goal of verifying if a binary has such data leakage, allowing our solution to scale to real-world software. Moreover, our method provides a new paradigm for PCC to use a TEE (other than the OS) as an execution environment, which provides more powerful protection.


  • (1)
  • 100 ([n.d.]) [n.d.]. 1000 Genomes Project.
  • cap ([n.d.]) [n.d.]. Capstone - The Ultimate Disassembler.
  • our ([n.d.]) [n.d.]. CAT-SGX.
  • fas ([n.d.]a) [n.d.]a. Fasta Benchmark.
  • fas ([n.d.]b) [n.d.]b. FASTA format.
  • ori ([n.d.]) [n.d.]. Intel SGX RA Sample.
  • mbe ([n.d.]) [n.d.]. mbed TLS.
  • mus ([n.d.]) [n.d.]. Musl Libc.
  • sgx ([n.d.]) [n.d.]. SGX nBench.
  • sie ([n.d.]) [n.d.]. Siege.
  • ccc (2019) 2019. Confidential Computing Consortium.
  • asy (2019) 2019. Google. Asylo.
  • Almeida et al. (2016) José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, François Dupressoir, and Michael Emmi. 2016. Verifying constant-time implementations. In 25th USENIX Security Symposium (USENIX Security 16). 53–70.
  • Appel (2001) Andrew W Appel. 2001. Foundational proof-carrying code. In Proceedings 16th Annual IEEE Symposium on Logic in Computer Science. IEEE, 247–256.
  • Appel et al. (2003) Andrew W Appel, Neophytos Michael, Aaron Stump, and Roberto Virga. 2003. A Trustworthy Proof Checker.

    Journal of Automated Reasoning

    31, 3-4 (2003), 231–260.
  • Arnautov et al. (2016) Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin, Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O’keeffe, Mark L Stillwell, et al. 2016. SCONE: Secure Linux Containers with Intel SGX. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 689–703.
  • Barnett et al. (2005) Mike Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K Rustan M Leino. 2005. Boogie: A modular reusable verifier for object-oriented programs. In International Symposium on Formal Methods for Components and Objects. Springer, 364–387.
  • Bauman et al. (2018) Erick Bauman, Huibo Wang, Mingwei Zhang, and Zhiqiang Lin. 2018. SGXElide: enabling enclave code secrecy via self-modification. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM, 75–86.
  • Bertot and Castéran (2013) Yves Bertot and Pierre Castéran. 2013. Interactive theorem proving and program development: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media.
  • Biondo et al. (2018) Andrea Biondo, Mauro Conti, Lucas Davi, Tommaso Frassetto, and Ahmad-Reza Sadeghi. 2018. The Guard’s Dilemma: Efficient Code-Reuse Attacks Against Intel SGX. In 27th USENIX Security Symposium (USENIX Security 18). 1213–1227.
  • Brumley et al. (2011) David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J Schwartz. 2011. BAP: A binary analysis platform. In International Conference on Computer Aided Verification. Springer, 463–469.
  • Burow et al. (2019) Nathan Burow, Xinping Zhang, and Mathias Payer. 2019. SoK: Shining light on shadow stacks. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 985–999.
  • Chakrabarti et al. (2019) Somnath Chakrabarti, Matthew Hoekstra, Dmitrii Kuvaiskii, and Mona Vij. 2019. Scaling Intel® Software Guard Extensions Applications with Intel® SGX Card. In Proceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy. 1–9.
  • Chen et al. (2019) Guoxing Chen, Sanchuan Chen, Yuan Xiao, Yinqian Zhang, Zhiqiang Lin, and Ten H Lai. 2019. Sgxpectre: Stealing intel secrets from sgx enclaves via speculative execution. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 142–157.
  • Chen et al. (2018) Guoxing Chen, Wenhao Wang, Tianyu Chen, Sanchuan Chen, Yinqian Zhang, XiaoFeng Wang, Ten-Hwang Lai, and Dongdai Lin. 2018. Racing in hyperspace: Closing hyper-threading side channels on sgx with contrived data races. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 178–194.
  • Colby et al. (2000) Christopher Colby, Peter Lee, George C Necula, Fred Blau, Mark Plesko, and Kenneth Cline. 2000. A certifying compiler for Java. In ACM SIGPLAN Notices, Vol. 35. ACM, 95–107.
  • De Moura and Bjørner (2008) Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.
  • Doychev et al. (2015) Goran Doychev, Boris Köpf, Laurent Mauborgne, and Jan Reineke. 2015. Cacheaudit: A tool for the static analysis of cache side channels. ACM Transactions on Information and System Security (TISSEC) 18, 1 (2015), 1–32.
  • Erlingsson et al. (2006) Úlfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, and George C Necula. 2006. XFI: Software guards for system address spaces. In Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 75–88.
  • Gras et al. (2018) Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2018. Translation leak-aside buffer: Defeating cache side-channel protections with TLB attacks. In 27th USENIX Security Symposium (USENIX Security 18). 955–972.
  • Gruss et al. (2017) Daniel Gruss, Julian Lettner, Felix Schuster, Olya Ohrimenko, Istvan Haller, and Manuel Costa. 2017. Strong and efficient cache side-channel protection using hardware transactional memory. In 26th USENIX Security Symposium (USENIX Security 17). 217–233.
  • Guo et al. (2007) Yu Guo, Xinyu Jiang, Yiyun Chen, and Chunxiao Lin. 2007. A certified thread library for multithreaded user programs. In First Joint IEEE/IFIP Symposium on Theoretical Aspects of Software Engineering (TASE’07). IEEE, 117–126.
  • Homeier and Martin (1995) Peter V. Homeier and David F. Martin. 1995. A mechanically verified verification condition generator. Comput. J. 38, 2 (1995), 131–141.
  • Hunt et al. (2018a) Tyler Hunt, Congzheng Song, Reza Shokri, Vitaly Shmatikov, and Emmett Witchel. 2018a. Chiron: Privacy-preserving machine learning as a service. arXiv preprint arXiv:1803.05961 (2018).
  • Hunt et al. (2018b) Tyler Hunt, Zhiting Zhu, Yuanzhong Xu, Simon Peter, and Emmett Witchel. 2018b. Ryoan: A distributed sandbox for untrusted computation on secret data. ACM Transactions on Computer Systems (TOCS) 35, 4 (2018), 13.
  • Hynes et al. (2018) Nick Hynes, Raymond Cheng, and Dawn Song. 2018. Efficient deep learning on multi-source private data. arXiv preprint arXiv:1807.06689 (2018).
  • Jensen (1992) Herbert L Jensen. 1992. Using neural networks for credit scoring. Managerial finance 18, 6 (1992), 15–26.
  • Küçük et al. (2019) Kubilay Ahmet Küçük, David Grawrock, and Andrew Martin. 2019. Managing confidentiality leaks through private algorithms on Software Guard eXtensions (SGX) enclaves. EURASIP Journal on Information Security 2019, 1 (2019), 14.
  • Lee et al. (2017a) Jaehyuk Lee, Jinsoo Jang, Yeongjin Jang, Nohyun Kwak, Yeseul Choi, Changho Choi, Taesoo Kim, Marcus Peinado, and Brent ByungHoon Kang. 2017a. Hacking in darkness: Return-oriented programming against secure enclaves. In 26th USENIX Security Symposium (USENIX Security 17). 523–539.
  • Lee et al. (2017b) Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, and Marcus Peinado. 2017b. Inferring fine-grained control flow inside SGX enclaves with branch shadowing. In 26th USENIX Security Symposium (USENIX Security 17). 557–574.
  • Leroy (2006) Xavier Leroy. 2006. Formal Certification of a Compiler Back-End or: Programming a Compiler with a Proof Assistant. In ACM SIGPLAN Notices, Vol. 41. ACM, 42–54.
  • Mazmudar (2019) Miti Mazmudar. 2019. Mitigator: Privacy policy compliance using Intel SGX. Master’s thesis. University of Waterloo.
  • McKeen et al. (2016) Frank McKeen, Ilya Alexandrovich, Ittai Anati, Dror Caspi, Simon Johnson, Rebekah Leslie-Hurd, and Carlos Rozas. 2016. Intel® software guard extensions (intel® sgx) support for dynamic memory management inside an enclave. In Proceedings of the Hardware and Architectural Support for Security and Privacy 2016. ACM, 10.
  • McKeen et al. (2013) Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V Rozas, Hisham Shafi, Vedvyas Shanbhogue, and Uday R Savagaonkar. 2013. Innovative Instructions and Software Model for Isolated Execution. Hasp, isca 10, 1 (2013).
  • Necula (1997) George C Necula. 1997. Proof-carrying code. In Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 106–119.
  • Necula and Rahul (2001) George C Necula and Shree Prakash Rahul. 2001. Oracle-based checking of untrusted software. In ACM SIGPLAN Notices, Vol. 36. ACM, 142–154.
  • Needleman and Wunsch (1970) Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443–453.
  • Oleksenko et al. (2018) Oleksii Oleksenko, Bohdan Trach, Robert Krahn, Mark Silberstein, and Christof Fetzer. 2018. Varys: Protecting SGX enclaves from practical side-channel attacks. In 2018 USENIX Annual Technical Conference (USENIXATC 18). 227–240.
  • Orenbach et al. (2019) Meni Orenbach, Yan Michalevsky, Christof Fetzer, and Mark Silberstein. 2019. CoSMIX: a compiler-based system for secure memory instrumentation and execution in enclaves. In 2019 USENIX Annual Technical Conference (USENIXATC 19). 555–570.
  • Paulson (2000) Lawrence C Paulson. 2000. Isabelle: The next 700 theorem provers. arXiv preprint cs/9301106 (2000).
  • Pirzadeh et al. (2010) Heidar Pirzadeh, Danny Dubé, and Abdelwahab Hamou-Lhadj. 2010. An extended proof-carrying code framework for security enforcement. In Transactions on computational science XI. Springer, 249–269.
  • Priebe et al. (2019) Christian Priebe, Divya Muthukumaran, Joshua Lind, Huanzhou Zhu, Shujie Cui, Vasily A Sartakov, and Peter Pietzuch. 2019. SGX-LKL: Securing the host OS interface for trusted execution. arXiv preprint arXiv:1908.11143 (2019).
  • Quynh (2014) Nguyen Anh Quynh. 2014. Capstone: Next-gen disassembly framework. Black Hat USA (2014).
  • Russinovich (2017) Mark Russinovich. 2017. Introducing Azure confidential computing. Seattle, WA: Microsoft (2017).
  • Schuster et al. (2015) Felix Schuster, Manuel Costa, Cédric Fournet, Christos Gkantsidis, Marcus Peinado, Gloria Mainar-Ruiz, and Mark Russinovich. 2015. VC3: Trustworthy data analytics in the cloud using SGX. In 2015 IEEE Symposium on Security and Privacy. IEEE, 38–54.
  • Schwarz et al. (2019) Michael Schwarz, Samuel Weiser, and Daniel Gruss. 2019. Practical enclave malware with Intel SGX. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 177–196.
  • Schwarz et al. (2017) Michael Schwarz, Samuel Weiser, Daniel Gruss, Clémentine Maurice, and Stefan Mangard. 2017. Malware guard extension: Using SGX to conceal cache attacks. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 3–24.
  • Seo et al. (2017) Jaebaek Seo, Byoungyoung Lee, Seong Min Kim, Ming-Wei Shih, Insik Shin, Dongsu Han, and Taesoo Kim. 2017. SGX-Shield: Enabling Address Space Layout Randomization for SGX Programs.. In NDSS.
  • Shen et al. (2018) Youren Shen, Yu Chen, Kang Chen, Hongliang Tian, and Shoumeng Yan. 2018. To Isolate, or to Share?: That is a Question for Intel SGX. In Proceedings of the 9th Asia-Pacific Workshop on Systems. ACM, 4.
  • Shen et al. (2020) Youren Shen, Hongliang Tian, Yu Chen, Kang Chen, Runji Wang, Yi Xu, Yubin Xia, and Shoumeng Yan. 2020. Occlum: Secure and Efficient Multitasking Inside a Single Enclave of Intel SGX. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 955–970.
  • Shih et al. (2017) Ming-Wei Shih, Sangho Lee, Taesoo Kim, and Marcus Peinado. 2017. T-SGX: Eradicating Controlled-Channel Attacks Against Enclave Programs.. In NDSS.
  • Shinde et al. (2016) Shweta Shinde, Zheng Leong Chua, Viswesh Narayanan, and Prateek Saxena. 2016. Preventing page faults from telling your secrets. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. 317–328.
  • Shinde et al. (2017) Shweta Shinde, Dat Le Tien, Shruti Tople, and Prateek Saxena. 2017. Panoply: Low-TCB Linux Applications With SGX Enclaves.. In NDSS.
  • Shinde et al. (2020) Shweta Shinde, Shengyi Wang, Pinghai Yuan, Aquinas Hobor, Abhik Roychoudhury, and Prateek Saxena. 2020. BesFS: A POSIX Filesystem for Enclaves with a Mechanized Safety Proof. (2020).
  • Silva et al. (2017) Rodolfo Silva, Pedro Barbosa, and Andrey Brito. 2017. DynSGX: A Privacy Preserving Toolset for Dinamically Loading Functions into Intel (R) SGX Enclaves. In 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 314–321.
  • Sinha et al. (2016) Rohit Sinha, Manuel Costa, Akash Lal, Nuno P Lopes, Sriram Rajamani, Sanjit A Seshia, and Kapil Vaswani. 2016. A Design and Verification Methodology for Secure Isolated Regions. In ACM SIGPLAN Notices, Vol. 51. ACM, 665–681.
  • Sinha et al. (2015) Rohit Sinha, Sriram Rajamani, Sanjit Seshia, and Kapil Vaswani. 2015. Moat: Verifying confidentiality of enclave programs. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 1169–1184.
  • Sinha et al. (2017) Rohit Sinha, Sriram Rajamani, and Sanjit A Seshia. 2017. A compiler and verifier for page access oblivious computation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 649–660.
  • Subramanyan et al. (2017) Pramod Subramanyan, Rohit Sinha, Ilia Lebedev, Srinivas Devadas, and Sanjit A Seshia. 2017. A Formal Foundation for Secure Remote Execution of Enclaves. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2435–2450.
  • Tsai et al. (2017) Chia-Che Tsai, Donald E Porter, and Mona Vij. 2017. Graphene-SGX: A Practical Library OS for Unmodified Applications on SGX. In 2017 USENIX Annual Technical Conference (USENIXATC 17). 645–658.
  • Van Bulck et al. (2018) Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F Wenisch, Yuval Yarom, and Raoul Strackx. 2018. Foreshadow: Extracting the keys to the intel SGX kingdom with transient out-of-order execution. In 27th USENIX Security Symposium (USENIX Security 18). 991–1008.
  • Van Bulck et al. (2019) Jo Van Bulck, David Oswald, Eduard Marin, Abdulla Aldoseri, Flavio D Garcia, and Frank Piessens. 2019. A tale of two worlds: Assessing the vulnerability of enclave shielding runtimes. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1741–1758.
  • Wang et al. (2019) Shuai Wang, Yuyan Bao, Xiao Liu, Pei Wang, Danfeng Zhang, and Dinghao Wu. 2019. Identifying cache-based side channels through secret-augmented abstract interpretation. In 28th USENIX Security Symposium (USENIX Security 19). 657–674.
  • Wang et al. (2017) Wenhao Wang, Guoxing Chen, Xiaorui Pan, Yinqian Zhang, XiaoFeng Wang, Vincent Bindschaedler, Haixu Tang, and Carl A Gunter. 2017. Leaky cauldron on the dark land: Understanding memory side-channel hazards in SGX. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2421–2434.
  • Weichbrodt et al. (2016) Nico Weichbrodt, Anil Kurmus, Peter Pietzuch, and Rüdiger Kapitza. 2016. AsyncShock: Exploiting synchronisation bugs in Intel SGX enclaves. In European Symposium on Research in Computer Security. Springer, 440–457.
  • Wu et al. (2018) Meng Wu, Shengjian Guo, Patrick Schaumont, and Chao Wang. 2018. Eliminating timing side-channel leaks using program repair. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 15–26.
  • Xu et al. (2019) Xiaoyang Xu, Masoud Ghaffarinia, Wenhao Wang, Kevin W Hamlen, and Zhiqiang Lin. 2019. CONFIRM: Evaluating Compatibility and Relevance of Control-flow Integrity Protections for Modern Software. In 28th USENIX Security Symposium (USENIX Security 19). 1805–1821.
  • Xu et al. (2015) Yuanzhong Xu, Weidong Cui, and Marcus Peinado. 2015. Controlled-channel attacks: Deterministic side channels for untrusted operating systems. In 2015 IEEE Symposium on Security and Privacy. IEEE, 640–656.
  • Zhao et al. (2020) Wenjia Zhao, Kangjie Lu, Yong Qi, and Saiyu Qi. 2020. MPTEE: bringing flexible and efficient memory protection to Intel SGX. In Proceedings of the Fifteenth European Conference on Computer Systems. 1–15.


.1. Instrumentation Details

Here we illustrate other instrumentation modules in our code generator.

RSP modification instrumentation. Since RSP spilling would cause illegal implicit memory writing, RSP modification instructions should also be checked. This module first locates all RSP modification instructions in the program and then instruments assembly code after them to check whether the RSP values are out of bounds. Just like storing instruction instrumentation, the upper and lower boundaries of RSP are specified by the loader and written into the assembly instructions by the rewriter, while the compiler only fills them with speical immediates (5ffffffffffff and 6ffffffffffff).

When the instrumentation finds that the stack pointer is modified to an illegal address, it will cause the program to exit. Fig. 12 shows eight instructions be inserted after the ANDQ instruction, which is tend to reserve new stack spaces (minus 16 from the value in RSP register). We leave the enforcement of implicit modification of the stack pointer using PUSH and POP by adding guard pages (a page with no permission granted) to the dynamic loader.

1andq    $-16, %rsp 2pushq   %rax 3movabsq $0x5FFFFFFFFFFFFFFF, %rax 4cmpq    %rax, %rsp 5ja      exit_label 6movabsq $0x6FFFFFFFFFFFFFFF, %rax 7cmpq    %rax, %rsp 8jb      exit_label 9popq    %rax
Figure 12. RSP Modifying Instrumentation

Indirect branch instrumentation. For checking indirect branches, we first extract all legal target names at assembly level, and output them to a list. After that, we insert a inspection function calling in front of every indirect branch instruction (in Fig. 13), to achieve forward-edge CFI check at runtime. Specifically, the inspection function CFICheck is written and included in the target binary, to search if the indirect branch is on that list, therefore ensuring they conform to the program control flow.

1movq  %reg, %rdi 2callq CFICheck Instrumentations before callq ⁢%reg 1movq  (%reg), %rdi 2callq CFICheck Instrumentations before callq ⁢(%reg)
Figure 13. Indirect Call Instrumentation

Shadow stack. For function returns, the code generator instruments instructions to support a shadow call stack, which is a fully precise mechanism for protecting backwards edges (Burow et al., 2019). The shadow stack’s base address is specified by the loader, and will be rewritten by the Imm rewriter (to replace the imm filled in by the compiler in advance).

As shown in Fig. 14, at every function entry, we insert instructions (before the function stack alignment) that will modify the shadow stack top pointer and push the function’s return address into the shadow stack. Similar to instrumentation at the function entry, instructions inserted before the function returns modify the stack pointer and pop the return address. Comparing the saved return address with the real return address, RET will be checked.

1movabsq $0x2FFFFFFFFFFF, %r11 2addq  $8, (%r11) 3movq  (%r11), %r10 4addq  %r10, %r11 5movq  (%rsp), %r10 6movq  %r10, (%r11) 7pushq %rbp 8movq  %rsp, %rbp Instrumentation before stack alignment 1movabsq $0x2FFFFFFFFFFF, %r11 2movq  (%r11), %r10 3addq  %r11, %r10 4subq  $8, (%r11) 5movq  (%r10), %r11 6cmpq  %r11, (%rsp) 7jne exit_label Instrumentation before function return 1exit_label: 2    movl    $0xFFFFFFFF, %edi 3    callq   exit Instrumentation for exit label
Figure 14. Structured Guard Formats of Shadow Stack

SSA monitoring instrumentation. As demonstrated in previous works (Gruss et al., 2017; Chen et al., 2018), AEX can be detected by monitoring the SSA. Therefor, to enforce P6, we instrument every basic block to set a marker in the SSA and monitor whether the marker is overwritten by AEX within the basic block. The execution is terminated once the number of AEXes within the basic block exceeds a preset threshold.

A function is also implemented to get the interrupt context information in the bootstrap enclave’s SSA area. At the beginning of each basic block, we call this function through instrumentation to check whether there are too many interruptions during execution. When a basic block is too large, this function will also be called in the middle of basic block every k () instructions. We count the number of interrupts/AEXs that occurred from the last check to the current check. When 22 or more are triggered, the target program aborts.

Alternatives. To mitigate AEX based side-channel risks, CAT-SGX provides an alternative enforcement mechanisms, through TSX, which can be chosen when compiling the target program. The TSX approach is based upon T-SGX (Shih et al., 2017), putting transaction memory protection on each basic block and running a fall-back function to keep track of the number of interrupts observed. Just like T-SGX, when more than 10 consecutive AEXes happen, the computation aborts, due to the concern of an ongoing side-channel attack. The protection is instrumented by the generator and its presence is verified by the code consumer in the enclave.

We have implemented a function, in which XBEGIN and XEND is called and fallback is specified. Around each branch and CALL/RET instruction and at the begin/end of each basic block, we call this function so that the program leaves the last transaction and enters a new transaction when a possible control flow branch occurs and completes. Some code snippets are shown in Figure 15.

To deal with the compatibility problems caused by calling functions that has no need to be checked (e.g., the system calls via OCall stubs), we implemented another non-TSX wrapper for external functions. For instance, our pass will generate an alternative function wrapper_foo to replace original function foo, to avoid the TSX instrumentation.

1movq  %rax, %r15 2lahf 3movq  %rax, %r14 4callq transactionEndBegin 5movq  %r14, %rax 6sahf 7movq  %r15, %rax
Figure 15. TSX instrumentation

.2. Preparing Target Binary

Libc. To manage interactions with the untrusted operating system, we make some Ocall stubs for system calls. Related works (Shinde et al., 2017; Tsai et al., 2017; Priebe et al., 2019; Shinde et al., 2020) provide various great Ocall interfaces. But some of them still require additional interface sanitizations. We use parts of Musl Libc (mus, [n.d.]) for completing the code loading support (Section 4.3). Undoubtedly, the Musl Libc also should be instrumented. Then, it can be linked against other necessary libraries statically, e.g., mbedTLS for buiding an HTTPS server.

Stack and heap. We also reserved customized stack and heap space for the target program execution. During the above-mentioned loading phase, the CAT-SGX system will initialize a 4MB size memory space for the stack, and will link against a customized and instrumented malloc function for later heap usage. In current version of our prototype, the memory ranges of the additional stack and the heap provided for the target program are fixed, for efficient boundary checking.

Other necessary functions. The instrumented proof includes not only the assembly instructions. Some necessary functions and objects also should be compiled and linked. Since we need an algorithm to check if the address of an indirect branch target is on the legal entry label list (for P5 enforcement), a binary search function CFICheck is inserted into the target program. Similarly, as we need a function to enforce P6, necessary functions need to be called for SSA monitoring frequently. Those objects would also be disassembled and checked during the stage of proof verification, to ensure that these can not be compromised when they are called.