For most of computer history, designing a computer architecture around the CPU allowed extracting the most performance benefits from Moore’s law. Nowadays, however, the demand for increased computation power is usually satisfied with special-purpose hardware: GPUs are orders of magnitude more efficient than a CPU can be at handling parallel workloads such as graphics and machine learning, and FPGAs often achieve similar gains for custom workloads. Tasks such as machine learning are even pervasive enough to justify the investment into fully custom ASICs(Jouppi et al., 2017). In these modern platform architectures, the CPU’s main job is to set up the computation for certain workloads (Juckeland et al., 2015) in the relevant specialized hardware and then collecting the results, possibly to feed them to yet another specialized hardware. Effectively, the CPU’s primary role is shifting towards a mere coordinator of the available specialized hardware in the platform. Cloud computing architectures are even adopting a disaggregated model (Katrinis et al., 2016) in which data centers no longer just consist of a number of connected servers, but of functional blocks connected with high-speed interconnects. Each block provides a pool of a particular resource, be it GPUs, CPUs, storage, FPGAs, to allow for fine-grained resource allocation and acceleration. When more resources are requested, only a particular block needs to be augmented, rather than requiring the provisioning of full-fledged monolithic servers.
At the same time, the security of modern systems has also come under scrutiny due to the numerous vulnerabilities related to the high complexity of operating systems and hypervisors (Checkoway and Shacham, 2013; Suzaki et al., 2011). Because of this, it has become more attractive to rely on smaller and lower layers, i.e., firmware or even immutable hardware to enforce security and to reduce the underlying trusted computing base (TCB). Most notably, this has led to the rise in trusted execution environments (TEEs). TEE designs vary to a large degree but, in general, they isolate execution environments without having to trust operating systems and hypervisors (Costan and Devadas, 2016; Winter, 2008; Costan et al., 2016). TEEs rely on hardware primitives of the CPU and only consider the CPU package to be trusted, while all the other hardware components of the platform are explicitly assumed malicious.
These two developments present an apparent disconnect: on one side, modern computer architectures are increasingly relying on specialized hardware for performance and scalability. On the other, TEEs provide strong security guarantees only if code and data are confined within the CPU. Using specialized hardware in existing TEEs either requires to trust the OS (e.g., SGX (Costan and Devadas, 2016)) or to bloat hypervisors or specialized OS (TrustZone Secure OS (Ltd., 2021)) with drivers. E.g., the keyboard input to an SGX enclave can be read and altered by the untrusted OS, whereas in the case of TrustZone, the security of that input depends on the large TCB, including drivers for unused peripherals. Since the hardware TCB of a TEE is statically decided at design time by the CPU manufacturer, end-users need to rely not only on a fixed hardware TCB, but also potentially need to add drivers of other devices into the software TCB if other enclaves want to make use of them. In other words, current TEEs struggle to support specialized hardware while adhering to the principle of least privilege.
We propose a TEE with a configurable software and hardware TCB including arbitrary specialized hardware, a concept that we name platform isolation environment (PIE). PIE executes applications in platform-wide enclaves, which are analogous to the enclaves provided by TEEs, except that they span several hardware components. For example, a platform-wide enclave can be composed of a GPU (or only some GPU cores) and the CPU, and the custom code running on them. Like in traditional enclaves, a platform-wide enclave can be remotely attested. However, the PIE attestation not only reports a measurement of the software TCB but also of the hardware components that are part of the platform-wide enclave.
The shift towards configurable hardware and software TCBs has wide-ranging implications concerning integrity, confidentiality, and attestation of a platform-wide enclave. Attestation, for one, should cover all individual components of a platform-wide enclave atomically to defend against an attacker that changes the configuration in between attestations to separate components. Moreover, the untrusted OS may remap specialized hardware devices at runtime with an untrustworthy device, which should not receive access to sensitive data. We carefully design PIE to not be vulnerable to such attacks and present an in-depth security analysis.
We mitigate the above-mentioned attacks with two new properties of platform-wide enclaves: platform-wide attestation and platform awareness. Platform-wide attestation expands the attestation to cover all components within a platform-wide enclave, and platform awareness enables enclaves to react to changes in their ecosystem, i.e., remapping by the OS. We achieve this by introducing two new events into the enclave lifecycle, connect and disconnect, which allow to track the liveliness of one enclave from another.
We validate the challenges and design choices in a prototype that we develop on top of RISC-V and Keystone (Lee et al., 2020). We make the key design decision to facilitate the communication between specialized hardware and the CPU with shared memory. This not only reduces the cost of context switches in enclave-to-enclave communication but also allows enclaves to communicate directly with specialized hardware, as these are memory-mapped, and allows to reuse existing drivers. In particular, our prototype modifies the way Keystone uses the RISC-V physical memory protections (PMP) to let enclave memory overlap, which enables shared memory. We perform an extensive security analysis of our prototype, analyzing the implications of our design with respect to side channels, the enclave’s interactions with peripherals and their life-cycles, and how attestation can now be extended to reflect the configuration of a platform and thus form a dynamic hardware TCB.
We further evaluate PIE in two case studies: first, we demonstrate an end-to-end prototype on an FPGA with simple peripherals emulated on a microcontroller; and second, we take an existing accelerator (Zaruba et al., 2020), and integrate it to PIE, adding support for multi-tenant isolation. In the first case study, we developed a prototype on top of an FPGA. The TCB of Keystone increased by around lines of code (LoC) and the additional logic in the context switch increased by 200 cycles (from around 4700 to 4900 cycles). In our implementation, to provide more flexibility, we confine drivers in what we call controller enclaves. This allows multiple platform-wide enclaves to use the same specialized hardware concurrently and enforce access control to the peripheral, e.g., rate-limiting or granting exclusive access, while still giving meaningful isolation guarantees to remote verifiers. In the second case study, we demonstrate how to adapt an existing accelerator (Zaruba et al., 2020) so it can support multi-tenant isolation and remote attestation in PIE.
In summary, the contributions of our paper are the following:
We extend traditional TEEs with a dynamic hardware TCB, i.e., the enclave’s TCB only includes the driver, and firmware of the used specialized hardware. We call these new systems platform isolation environment (PIE). We identify key security properties for PIE, namely platform-wide attestation, and platform awareness. Additionally, we propose a software design for PIE that abstracts the underlying hardware layer, and try to integrate with the existing driver ecosystem.
We analyze the security aspects of PIE in detail. This includes the security implications of PIE’s design decisions and a number of relevant side channels.
We demonstrate two case studies: first, we present an end-to-end prototype based on Keystone (Lee et al., 2020) on an FPGA running a RISC-V processor (Zaruba and Benini, 2019) including multiple external peripherals emulated by Arduino microcontrollers. Our modifications to the software TCB of Keystone only amount to around 600 LoC. Second, we perform a case study based on a GPU-style accelerator (Zaruba et al., 2020) and integrate it within PIE while also enabling multi-tenant isolation.
Keystone (Lee et al., 2020) is a TEE framework based on RISC-V that utilizes physical memory protection or PMP to provide isolation. PMP is part of the RISC-V privilege standard (Waterman et al., 2019) and it allows to specify access policies that can individually allow or deny reading, writing, and executing for a certain memory range. E.g., PMP can be used to restrict the operating system (OS) from accessing the memory of the bootloader. Every access request to a prohibited range gets trapped precisely in the core and results in a hardware exception. Keystone relies on a low-level firmware with the highest privilege, called security monitor (SM), to manage the PMP.
The SM maintains its own memory separate from the OS and protected by a PMP entry. It also facilitates all enclave calls, e.g., it creates, runs, and destroys enclaves. The SM configures the PMP so that the OS can no longer access the enclave’s private memory. Upon a context switch, the SM re-configures the PMP to allow or block access to the enclave. E.g., during a context switch from an enclave to the OS, the SM changes the PMP configuration such that access to the enclave memory is prohibited. Conversely, on a context switch back to the enclave, the PMP gets reconfigured to allow accesses to enclave memory. Because the SM is critical for the security of any enclave and the whole system, it aims to be very minimal and lean. As such, the SM is orders of magnitudes smaller than hypervisors and operating systems (15k LoC vs millions LoC (Torvalds et al., 2020; Barham et al., 2003)). There are also efforts to create formal proofs for such a SM (Lebedev et al., 2019). Keystone also provides extensions for cache side-channel protections using page coloring or dynamic enclave memory.
The device tree is a list that accurately describes the physical memory mappings of a platform. It describes the central processor, i.e., its speed, its ISA, and at what address its cache starts. It also includes the DRAM base address and various other components on the die, such as various internal and external buses. It is usually used by the bootloader and the OS to bootstrap the system. As some peripherals cannot be detected automatically, they must be present in the device tree, as otherwise they will not get recognized by the OS. The device tree is usually burnt into ROM and available to the bootloader and the OS. It can therefore be considered trusted.
3. Problem Statement
Modern platforms are composed of complex heterogeneous specialized hardware, from simple sensors that measure temperature or humidity to complex accelerators for machine learning. All these components are connected to the CPU over buses (e.g., PCI, USB, etc.). Many modern applications are critically dependent on such specialized hardware, and often they handle sensitive data, e.g., patient records for machine learning. Thus, these specialized hardwares’ authenticity and integrity are critical, and the data they handle must remain confidential.
Applications that handle sensitive data and use a specialized hardware device can be deployed securely with one of the following three existing approaches: 1) designing a fully dedicated system, or 2) renting a dedicated virtual machine and placing trust in the hypervisor, or 3) relying on the OS. None of these approaches to be satisfactory due to lack of generality, cost, and the need to trust codebases with millions of lines of code (Torvalds et al., 2020; Barham et al., 2003). Existing TEEs such as Intel SGX, RISC-V Keystone, ARM TrustZone, etc., provide security guarantees only to the applications running on the CPU cores leaving specialized hardware unprotected. Moreover, SGX and Keystone enclaves rely on the untrusted OS to communicate with specialized hardware. On the other hand, ARM TrustZone provides isolated communication between the enclaves and components such as a touchscreen, fingerprint sensor, but requires trusting the entire secure OS, including device drivers not used by the enclave.
3.1. Attacker Model
The attacker model is tightly coupled with the type of specialized hardware. We separate the specialized hardware into two main classes due to their distinct effect on the attacker model:
Specialized hardware with physical interaction: Specialized hardware that interact with their environment range from input-only, such as input peripherals (e.g., mouse, keyboard) and sensors (e.g., temperature sensor) to output-only devices (e.g., monitor) and combined IO devices (e.g., touchscreen). For any such device, a local physical adversary can manipulate the environment and thus the input (and potentially the output). E.g., a physical adversary can point a laser at a light sensor, thus changing the sensor’s reading but not the room’s overall light intensity. Hence, any specialized hardware that interacts with its physical environment cannot tolerate a physical adversary.
Specialized hardware without physical interaction: There are specialized hardware units that do not explicitly interact with their environment. They draw power and produce heat, but their input and output are not related to the environment. GPUs and other accelerators are the prime examples of this class of specialized hardware, for whom a local physical adversary can be tolerated.
In this paper, we assume a remote attacker that remotely controls the entire software stack, including the OS and hypervisor. While the remote attacker model is a weaker assumption compared to the local physical one considered in the existing TEEs, the former covers a wide class of specialized hardware (e.g., specialized hardware with physical interaction) that cannot tolerate physical attackers. Hence, the attacker cannot access the platform of the specialized hardware physically or hot-swap a device. Note that the untrusted OS is still in charge of managing specialized hardware devices, and thus is able to remap the devices or send a reset or power-off signal. We assume that the CPU firmware is trusted. Similar to existing TEE proposals, side channel attacks remain out of scope (Costan and Devadas, 2016) in our adversary model. However, we will discuss the implications of our proposal on existing side channel attacks and defenses in Section 6. Finally, we consider denial-of-service attacks to be out of scope in this paper.
As mentioned above, several approaches could be pursued to integrate specialized hardware into a TEE. Among them, we investigate approaches that reuse components of existing systems as much as possible, both in terms of software and in terms of hardware. This approach leaves the OS in a supervisor role, liaising between the isolated environments and specialized hardware, similar to memory in traditional TEEs (c.f. Section 2). But, this decision leaves leeway for privileged adversaries to break the system’s isolation. We therefore need to consider both existing threats to traditional TEEs and emerging threats due to the nature of a reconfigurable hardware TCB. We analyze these in more detail in the next three paragraphs.
Traditionally, the OS or the hypervisor act as the bridge between applications and specialized hardware. They are responsible to set-up these communication links properly, and can not only observe the data exchanged between different components, but also tamper with it. As they are not trusted in our attacker model, we need to ensure that each components establishes a secure link with each other. This is not trivial, as the OS is untrusted and may not cooperate. Finally, the fact that several accelerators may need to support a form of multi-tenant isolation (e.g., multiple tasks on a GPU), requires careful consideration, as sensitive data within a isolated environment in PIE should remain confidential irrespective of what else is running on the system.
Remote attestation is a key part of any TEE. However, with multiple specialized hardware devices and enclaves on the CPU making up a distributed enclave, the straight-forward approach to just individually attest to every component is vulnerable to time-of-check-time-of-use attacks. Several attestations (one for each component of the TEE) must be linked with a guarantee that nothing has changed in the components already attested since the last attestation. Without this guarantee, an attacker could tamper with the configuration of already attested enclaves and thus tricking the remote verifier.
Remapping attacks are also relevant during runtime, as the OS still manages the memory. Well-timed disconnects or memory remappings could result in leakage of confidential data, e.g., if an adversary remaps a specialized hardware device and replaces it with a malicious device, the CPU enclave should not share sensitive data with the new device.
4. Overview of Our Approach
In this section, we provide an overview of our approach PIE and introduce platform-wide enclaves. Platform-wide enclaves dynamically extend the TCB of traditional TEEs running on the CPU to the specialized hardware. Platform-wide enclaves consist of multiple distributed enclaves that run on various hardware components such as the CPU and specialized hardware as shown in Figure 1. Platform-wide enclaves aim to provide similar security properties as traditional enclaves, such as integrity, attestation, and data isolation from other enclaves and the attacker-controlled OS.
4.1. Enclaves within a platform-wide enclave
A platform-wide enclave consists of multiple enclaves that run on different hardware components and securely communicate with each other. A platform-wide enclave typically contains several interconnected processor-local enclaves and specialized hardware enclaves. In the following, we describe the two main enclave types that form a platform-wide enclave.
Processor-local enclaves are equivalent to traditional enclaves and their runtime memory must be isolated from the OS and should only be accessible to the enclave itself. To achieve that, we use physical memory protection (PMP) from the RISC-V privilege standard (Waterman et al., 2019) as introduced by Keystone.
We further differentiate two types of processor-local enclaves: application enclaves, and controller enclaves which encapsulate the application-specific, and driver logic, respectively. As seen in Figure 2, , and are the application enclave, and controller enclave in the blue-outlined platform-wide enclave of Figure 1. The controller enclave also provides isolation between the application enclaves in a scenario where multiple application enclaves want to access a certain specialized hardware. Therefore, controller enclave enforces access control on the connected application enclaves in terms of how a certain specialized hardware can be accessed, e.g., exclusive access to a keyboard or shared concurrent access to a GPU.
Enclaves on specialized hardware
Most specialized hardware run some firmware or even some custom code (e.g., graphic shaders) which has to be included in the TCB of a platform-wide enclave. E.g., the GPU and its firmware in Figure 1 is part of the yellow platform-wide enclave. Since a remote verifier also wants to attest to the specialized hardware, they have to be modified to support attestation. However, we stress that these modifications remain rather small (c.f. Section 7.1.2) and usually only involve small changes in the device firmware.
4.2. Communication with specialized hardware
To enable processor-local enclaves and specialized hardware enclaves to securely communicate, we make the observation that these devices generally communicate over mapped address regions: They either use an address range that is not reflected in DRAM, so-called memory-mapped-input-output registers (MMIO), or a shared DRAM region accessed via direct memory access (DMA). To maximize compatibility with existing drivers and specialized hardware, we chose not to change this behavior. Instead, we isolate the address regions that are used in this communication. Existing hardware mechanisms like PMP already allow restricting access to a specific address region. Until now, such hardware mechanisms have been predominantly used to restrict memory access, but in our design, they also allow to restrict access to other address regions that are not in the DRAM range111E.g., DRAM could occupy the address range 0x8000000 - 0xF0000000, whereas other specialized hardware such as UART could reside at 0x4000000 - 0x4001000.. Note that these address regions from specialized hardware are either i) static, i.e., hardcoded and provided to the SM in the form of a trusted device tree file, or ii) dynamic, i.e., configured at runtime by the SM. In our design, the SM always maintains a complete overview of all such regions and only allows a single enclave to access an address region of a specialized hardware.
While we made the changes mentioned above to the SM to support specialized hardware with both MMIO and DMA, they also enable a new way for enclaves to communicate: shared memory. This reflects a major difference to traditional TEEs because until now; most traditional enclaves could only communicate through the untrusted OS222Concurrent work (Yu et al., 2020) has also shown how shared memory can improve the performance of enclaves significantly..
4.3. Changes within a platform-wide enclave
The untrusted OS manages specialized hardware devices; hence the OS could remap any device or send a reset signal. E.g., a GPU that is handing sensitive data could be shut down by the OS and remapped to a different GPU during runtime. In such a scenario, the enclave should stop sending sensitive data to the GPU until the remote verifier re-attests the new GPU. Hence, the enclave has to react to these external events, i.e., it has to be aware of the platform’s state. In traditional TEEs, enclaves are self-sufficient isolated entities and are only dependent on themselves. Therefore, they can only be in two states: running or stopped. Platform-wide enclaves are more complex since they can contain multiple enclaves, all of which could be running, stopped, or even killed. Platform-wide enclaves have to react correctly upon any of these events to keep the data confidential. We achieve this by expanding the enclave lifecycle and adding two new events: connect and disconnect. The asynchronous nature of these events requires a detailed analysis of the security of the entire system, e.g., a well-timed disconnect could lead to data leaks across shared memory regions. We solve this issue by assigning ownership of the shared memory among the enclaves that are accessing that memory. Upon any external events, if one of the participating enclaves dies, the sole ownership is transferred to the remaining one (more details in Section 5.2). Therefore, the components in the platform-wide enclave are platform-aware since they are aware of any change within their ecosystem.
4.4. Attestation of a platform-wide enclave
Since a platform-wide enclave consists of multiple distributed enclaves, attestation poses another challenge. Individual attestations to each enclave that make up a platform-wide enclave could be vulnerable to timely manipulations by an adversary to cause time-of-check-to-time-of-use (TOCTOU) issues. To provide a platform-wide attestation, we need to chain attestation reports of all the components of a platform-wide enclave. This includes the attestation report of the enclaves and specialized hardware firmware. Attestation of the specialized hardware firmware is achieved by signing a challenge message with the key embedded in the specialized hardware (refer to Section 4.1). The attestation of a platform-wide enclave could either be a one-time attestation that results in a huge chain of reports or individual attestations of all entities that can be combined by the verifier. We show that individual attestations provide more flexibility for the verifier and are secure against TOCTOU attacks by adding unique identifiers to enclaves and appending all connected enclaves’ IDs to the attestation report.
4.5. Summary of Interactions
To summarize all interactions between the components of a platform-wide enclave, we present an example scenario in Figure 3 with an application enclave, a controller enclave, and a specialized hardware device. The scenario is as the following:
The OS creates and configures the controller enclave and hands over control to the SM. The SM then revokes the OS’s access permissions to the private memory regions of the controller enclave.
The OS requests SM to connect controller enclave with the specialized hardware device. The SM sets up a new shared memory region and enables access only to the controller enclave.
Similar to the controller enclave, the OS creates and configures the application enclave, and then, once again, the SM revokes access to the private memory of the enclave.
The OS calls the SM to establish a shared memory region between the controller enclave and the application enclave.
After a remote verifier attests to all enclaves (using the application enclave as the entry point), sensitive data can be transmitted, and the normal operation starts.
Any disruption, i.e., a disconnection of the specialized hardware device, leads to an asynchronous disconnect, where the sole ownership of the shared memory between the device and the controller enclave moves to controller enclave (see Section 5.2). Moreover, the enclaves may halt execution until re-attested.
5. Platform Isolation Environment
In this section, we describe platform isolation environment or PIE in detail. PIE is based on the idea of platform-wide enclaves that integrates specialized hardware devices to the traditional processor-core enclaves while maintaining small hardware and software TCB. First, we discuss the changes needed to incorporate into the specialized hardware to make them compatible with PIE. Then we introduce a shared memory model that allows enclaves to communicate with each other and specialized hardware securely. Next, we discuss how the enclave life cycle changes given these modifications and how a remote verifier can get proof of the state of a platform-wide enclave. Finally, we provide a software design for PIE that makes PIE for the software developers easy to adapt.
Changes to specialized hardware
There exists a wide range of specialized hardware devices that have unique behavior and integrate differently into PIE. In this paper, we try to cover most devices but stress that some special cases require further analysis. We go from the simplest specialized hardware device we can imagine, a simple sensor, to one of the most complex, a sophisticated accelerator for a data center. Most other specialized hardware devices should fall in between these two examples and thus may require modifications between these two extremes.
1. Simple Sensors
e.g., a temperature sensor only requires a minimal form of attestation to be integrated into PIE. They must contain some key materials to sign some statements about themselves. This is mandatory for (remote) attestation of a platform-wide enclave that includes an attestation report of such a sensor. Usually, these sensors do not contain any secret data from a processor-local enclave and hence do not need to protect such data.
on the other hand, tend to be very complex and require more extensive modifications. Like the former, they must support attestation, but they may also support isolation for multiple enclaves’ secret data. Let us assume data-center applications, where multiple stakeholders want to move multiple compute-intensive tasks from the CPU to the accelerator. The individual tasks’ data should remain confidential and isolated, not only on the CPU but also on the accelerator. Thus, such an accelerator requires isolated and attestable domains – in other words – enclaves that run on the specialized hardware.
5.1. Shared memory
Shared memory is a common mechanism for software to communicate across multiple cores or DMA regions of a specialized hardware. In our prototype, shared memory regions are centrally maintained by the CPU to enforce isolation as all specialized hardware is connected to the CPU.
5.1.1. Shared Memory between Processor-local Enclaves
As mentioned before, processor-local enclaves rely on PMP entries for isolation. We reuse the functionality of PMPs also to protect shared memory regions. Therefore, our proposal does not require any changes to the processor itself, as PMP is already part of the RISC-V standard (Waterman et al., 2019), and thus, it is already part of many processors. The SM, however, requires some modifications. For example, to store the configuration details for every shared memory region in local memory, the SM needs to reconfigure the PMP entries on a context switch similar to stock Keystone. It also must guarantee that at most two entities have access to the same shared buffer at a time. Additionally, the SM flushes the buffers’ content when one enclave is destroyed not to leak stale data.
5.1.2. Shared Memory with specialized hardware
Specialized hardware are connected to the CPU over buses, which are, in turn, controlled by bus controllers. As mentioned in Section 4, specialized hardware communicate over memory-mapped address ranges either in the form of MMIO registers or DMA memory regions. To isolate these address regions, we rely on existing hardware mechanisms, mainly PMP. Until now, such hardware mechanisms have predominantly been used to restrict memory access, but they can also be used to restrict access to any other address that is not in the DRAM range. Our prototype reuses the concepts from shared memory between processor-local enclaves by assuming the specialized hardware to be another processor-local enclave. As such, it can directly share an address range with a real processor-local enclave. The SM represents a specialized hardware internally as a special case of a processor-local enclave that cannot be scheduled or called, but it may share some address regions with other enclaves.
Polling and Interrupts
specialized hardware are synchronized with the processor with either polling or interrupts. Polling requires the CPU to check at a predetermined rate if new data is available from the specialized hardware, and thus, it can immediately be used in PIE. On the other hand, interrupts are more complicated as they enable the specialized hardware to notify the CPU that new data is available with the processor’s hardware support. In RISC-V specifically, interrupts can be delegated from the highest privilege mode to lower ones. So, in our prototype of PIE, the SM can delegate individual types of interrupts either to an enclave or to the OS333In RISC-V external interrupts are handled by the platform interrupt controller (PLIC) and then multiplexed on top of the external interrupt signals to the core. Thus the SM has to contain a driver for the PLIC to figure out which specialized hardware the interrupt is from.. Therefore, enclaves could also contain interrupt-handlers to, e.g., handle interrupts for a specific specialized hardware. Note that in our prototype, we only focus on polling.
5.2. Enclave life cycle
Traditional enclave’s life cycle includes three distinct states: idle, running, and paused. E.g., the enclave is first created and started in the idle state. Then the enclave transitions to the running state after a call from a user. Due to a timer interrupt by the OS scheduler, it is paused. It resumed again as soon as the scheduler yields back to the enclave.
5.2.1. Attaching specialized hardware
Before going into the lifecycle details, it is crucial to understand how specialized hardware are attached to the platform and initialized. There are two types of initialization procedures: statically compiled in the device tree or dynamically mapped by a bus controller. The device tree describes the specific address ranges and model numbers of all statically connected specialized hardware devices. It is usually stored in on-chip ROM and is provided to the OS by a zero-stage boot-loader, and thus, it can be considered trusted. Dynamically mapped devices are mapped by a bus controller and a driver to a DMA region. In our proposal, the bus controller’s driver, which sets up the DMA region, has to be trusted.
5.2.2. Changes during runtime
In PIE, we introduce two additional life cycle events to describe what happens when a shared memory region is altered. These are connect and disconnect that are needed due to the asynchronous nature of specialized hardware as they can prompt a disconnect event at any time.
The asynchronous disconnects are very critical as an enclave could end up continuing to use a memory region that is no longer protected due to a disconnect. Additionally, enclaves might want to provide graceful degradation and should not crash completely upon a disconnect. We solve both issues by splitting the disconnect event into an asynchronous disconnect and a synchronous disconnect. We consider both enclaves or specialized hardware of a shared memory region to have shared ownership over that region. If one of the entities dies, the other entity gains the sole ownership of the memory region. As such, an asynchronous disconnect leads to the sole ownership of a previously shared memory region. In turn, the untrusted OS can issue a synchronous disconnect command to the SM to free the shared memory region and notify the enclave of the disconnect. We mandate that before any connect command, the enclave must first receive a synchronous disconnect. If this was not the case, an adversary could disconnect a benign specialized hardware and reconnect a malicious one without the enclave noticing.
We illustrate the behavior of a platform-wide enclave in various circumstances using an example scenario. enclave 1 () connected to enclave 2 (). then is connected to a specialized hardware (). We denote the shared memory spaces as , and that is shared among & , and & respectively.
1. is killed
In such a situation, the specific shared memory space should be destroyed. To do that, the SM performs an asynchronous disconnect of for resulting in sole ownership of by . Upon the following synchronous disconnect gets fully destroyed.
A specific application may require any sensitive data from that is still on to be cleared. In such a scenario, will tell to clear this data on the following synchronous disconnect. Note that how the specialized hardware handles this call is also dependent on the implementation of that specialized hardware firmware enclave. For example, a specialized hardware that handles sensitive user data may decide to terminate the session completely (by zeroing out all the internal states) and destroy the shared memory between the specialized hardware and .
2. is killed
All shared memory regions associated with (this includes the shared memory spaces with both and ) are immediately modified by the SM during the asynchronous disconnect. They are now solely owned by and , respectively. Zeroing out also implicitly notifies that has died, forcing the specialized hardware to reset.
3. is killed/disconnected
In the asynchronous disconnect, the SM immediately modifies to . At some later point, the OS must issue a synchronous disconnect, which invalidates . This also results in the destruction of in case accesses through . From then on is available to connect to a new (after attestation).
5.3. Attestation of a platform-wide enclave
We extend the existing notion of attestation from processor-local enclaves to platform-wide enclaves that run on multiple components of the platform. Traditionally, attestation ensures the current state of an enclave through a measurement of the code. The standard attestation report of a traditional enclave contains the measurements of the both enclave, and the low-level firmware (e.g., the security monitor in RISC-V keystone). Both of which are signed by the platform key (known as the device root key). In contrast, an attestation of a platform-wide enclave must also reflect all included components. A potential attestation mechanism for a platform-wide enclave would be a lengthy report containing all the components’ measurements. Contrary to that, we provide the verifier with an option to decide which other enclaves he wants to attest. When the verifier attests a specific component of a platform-wide enclave, a list of identifiers of all the connected components is provided alongside the attestation report. These identifiers are assigned by the SM on the processor and can be used to specify which enclave one wants to attest. A verifier can then chose to attest some or all the connected enclaves from the list of identifiers if he wishes to do so.
5.3.1. Enclave identifiers
Upon creation of a new processor-local enclave, SM assigns a unique identifier to it. This identifier uniquely determines the enclaves participating in a specific shared memory region. When the enclave is killed, the identifier may be reused for other enclaves (c.f. Section 6).
5.3.2. Attestation Flow
Figure 4 depicts an example platform-wide enclave and the sequence of the attestations between its different components.The PIE enclave contains three components enclave 1 (), enclave 2 () and a specialized hardware firmware. Note that the platform-wide attestation process starts from the verifier who initiates a remote attestation request of . The attestation report of includes a list of connected enclaves’ identifiers, notably . The verifier then executes a series of individual remote attestation of all connected enclaves. Note that both individual attestations of and include each other’s identifier in their list of connected components. Note that both the attestation reports of and are signed by the same platform key. This proves to the remote verifier that both the enclaves are running on the same platform.
For specialized hardware, the attestation mechanism is different. First of all, a specialized hardware needs to contain some key material and a signed certificate from the manufacturer. This allows a verifier to observe the legitimacy of the specialized hardware. Secondly, the verifier from Figure 4 needs to be able to verify that the specialized hardware is directly talking to . This is facilitated by the SM, who checks the address regions for MMIO registers. DMA regions can even be established by an untrusted entity such as the OS. However, the attestation report of both the specialized hardware and contains the physical memory region that they share.
5.4. Software design
In this section, we introduce PIE’s software design which is one possible way for the application, driver, and firmware developers to adapt their software to be compatible with PIE without making a significant changes.
5.4.1. Software components
PIE’s software design consists of three entities: application enclaves, controller enclaves, and specialized hardware firmware as shown in Figure 2. application enclaves and controller enclaves are processor-local enclaves. specialized hardware are the components that are connected to the platform over buses. Contrary to a monolithic design where the application and driver is in one big enclave, our modular approach aims to provide high flexibility and increase code reuse.
1. Application enclaves
Application enclaves are similar to the traditional enclaves in Intel SGX or Keystone. In such TEEs, the enclaves cannot access specialized hardware without using the OS as a mediator, as the OS handles all drivers. In PIE, application enclaves also cannot communicate with a specialized hardware directly. The application enclaves use shared memory to communicate with a controller enclave that is a specialized hardware-specific enclave containing the driver logic. The rationale of separating the driver from the application logic is two-fold, i) to avoid requiring the developers to ship driver code with their application, and ii) one controller enclave per specialized hardware allows multiple application enclaves to communicate with that specific specialized hardware in parallel.
2. Controller enclave
The controller enclave contains the driver that facilitate communication with a specialized hardware. Note that application enclaves, standard non-enclave applications, and the OS cannot access the specialized hardware directly. The only way to communicate with a specialized hardware is through a device-specific controller enclave. Such a design choice isolates the specialized hardware drivers: one compromised driver does not affect other specialized hardware. The controller enclave maintains an isolated communication channel over shared memory (e.g., in RISC-V, the PMP entry corresponding to a shared memory ensures that only participating enclaves have access to that shared memory) to application enclaves and the specialized hardware. To simplify the configuration, we assume that only one active controller enclave per specialized hardware exists at a time. However, any controller enclave can be replaced at the user’s request.
5.4.2. Isolation of multi-application enclave session
In PIE, multiple application enclaves could connect to a single controller enclave to have simultaneous access to a specialized hardware. In such a scenario, the controller enclave keeps separate states corresponding to each of the application enclaves. Note that this is primarily a functional and then a security requirement as operations in one application enclave could affect the state of computation of another application enclave in case there is no isolation. For some specialized hardware, the controller enclave may need to reset the state of the specialized hardware when it switches to a session with a different application enclave (temporal separation). However, for specialized hardware such as GPU that support multiple isolated workloads in parallel, the state does not have to be reset.
5.4.3. Platform-wide attestation in the software design
The platform-wide attestation enables a remote verifier to verify the state of the all platform-wide enclave components. The attestation proceeds as the following:
1. Remote attestation of the application enclave
This is the first step of the platform-wide attestation to ensure that the platform is running the intended version of the application enclave. The application enclave attestation report includes the list of identifiers of the controller enclaves that have shared memory channels with that application enclave.
2. Remote attestation of the connected controller enclaves
The user then executes a series of individual remote attestation for the controller enclaves. The controller enclaves send the attestation report of themselves along with the certificate that is received from the connected specialized hardware. These reports are signed by the same platform key as of the application enclave attestation report. This proves that the application enclave and the connected controller enclaves are running on the same physical platform. Additionally, the controller enclave also states that the initiating application enclave has a shared memory channel with it.
6. Security Analysis
In this section, we perform an informal security analysis of PIE. We split the security analysis in three separate parts. First, we show how isolation from a malicious OS and other malicious specialized hardware is achieved. Then we analyze the attacker-controlled life cycle events of platform-wide enclaves, and finally, we discuss the security of platform-wide attestation.
6.1.1. Malicious OS
In PIE, the address regions that are used by platform-wide enclaves are protected using PMP entries (Waterman et al., 2019). Recall that in stock keystone (Lee et al., 2020), PMP is based on the physical memory range and only allows the specific enclave to access its private memory. On top of this, we use additional PMP entries to protect shared memory regions. Note that only the highest privilege level, i.e., the SM, can modify PMP entries. During a context switch, the SM re-configures all PMP entries such that the correct memory ranges are available again. The SM has the complete overview over all enclaves and shared memory regions and sets up all PMP entries on its own. The processor will throw an access fault exception upon any memory access into protected memory regions. The hardware page table walker also must behave according to the configured PMP rules. Therefore, miss-configured page tables cannot be used to leak any data from protected memory ranges.
The SM enforces a shared memory region to be strictly shared between two entities (e.g., a processor-local enclave and a specialized hardware device). The SM also verifies that no overlap exists between the memory ranges similar to the stock keystone.
6.1.2. Rogue DMA requests
Malicious peripherals can try to access protected memory through rogue DMA requests. Mechanisms to restrict DMA requests already exist in other architectures, e.g., AMD IOMMU (AMD, 2007), Intel VT-d (Abramson et al., 2006), and ARM SMMU (Holdings, 2013). These mechanisms process every DMA request and verify its validity according to some access policy. Any memory access attempt that does not fit the access policy is blocked. Currently, the RISC-V standard does not contain a mechanism to limit such DMA requests. However, an input-output variant of a PMP called IOPMP (sifive, 2019) is an upcoming proposal in RISC-V. IOPMP enforces the configured PMP rules for non-RISC-V peripherals. Since our current prototype does not have any peripheral interface open to DMA requests, we do not need such protection. However, platforms that support DMA could implement mechanisms like IOPMP.
6.1.3. Malicious application or controller enclaves
The attacker-controlled OS can spawn malicious application enclaves and controller enclaves. Users remotely attest before providing any secret to the application enclave. During the platform-wide attestation, the user checks the attestation report of both the application enclave and controller enclave and aborts if they do not match with the intended enclave measurements. The platform-wide attestation also reveals any misconfiguration of communication links by an adversary. Note that this only verifies the static configuration of communication links. Upon any change to this setup, the external verifier might need to re-attest (c.f. Section 6.2).
We require the controller enclave to provide isolation between multiple connected application enclaves (c.f. 5.4.2). Hence an attacker-controlled application enclave cannot access the confidential data of other application enclaves in the same controller enclaves.
Vulnerabilities within any of these enclaves could break the isolation guarantees of the data in that specific enclave. However, such an attack remains contained in the compromised enclave and cannot spread to connected enclaves. E.g., if a vulnerability in a controller enclave is found, only the data within that enclave is revealed. Any data that does not pass through the compromised controller enclave remains confidential. In this way, we provide defense-in-depth and reduce the potential impact of vulnerabilities.
6.1.4. Malicious specialized hardware
If an adversary manages to compromise the exact device that is used by an enclave, then any data on the specialized hardware is forfeit. However, any data not passed to the malicious device remains confidential.
We stress that certain manipulations of specific peripherals are always possible for an adversary. Consider, for example, a temperate sensor. Any local physical adversary can increase the real-world temperature and thus manipulate the sensor reading. However as we describe in out attacker model in Section 3.1, the physical attacker is out-of-scope of this paper. Note that this only applies to sensors, accelerators cannot get tampered with in this manner.
6.1.5. Side channel attacks
While we do not evaluate any defenses against side channel attacks, we discuss potential side channel attacks against our proposal and how they could potentially get mitigated. Many parts of PIE remain the same as in traditional TEEs where side channels have been widely investigated (Brasser et al., 2017, 2019a; Gruss et al., 2017), however, we note that PIE creates some new side channels that may not be present in traditional TEEs such as bus contention.
1. Traditional side channel attacks against TEEs
Microarchitectural side channels in traditional TEEs leverage shared resources such as the cache (Brasser et al., 2017), branch predictor (Lee et al., 2017), and memory translation (Xu et al., 2015). There exist several defenses against such attacks. Spatial partitioning of the cache in the form of cache coloring can fully defend against all cache based side channel attacks (Costan et al., 2016; Zhang et al., 2009; Zhao et al., 2020). Similarly, other proposals have called for cache randomization (Brasser et al., 2019a; Werner et al., 2019). Processor features such as transactional memory have also been shown to mitigate cache attacks with low overhead (Gruss et al., 2017). To the best of our knowledge, all of these proposals can be applied to PIE due to the similar internal structure to traditional TEEs.
2. Side channel attacks within specialized hardware
3. Bus contention
The introduction of peripherals into TEEs also implicates the bus as a new shared resource. An adversary could measure the throughput of her connection over the bus and observe any contention on the bus leading to less throughput. Bus contention, however only exposes the access patterns of the peripherals. In extreme cases, the timing of bus contention could leak data, e.g., one side of the branch performs bus accesses while the other does not. However, in normal cases, the data between the peripherals and their corresponding processor-local enclaves usually remains inaccessible to the attacker.
6.2. Lifecycle events
As described in Section 5.2, there are two additional events for platform-wide enclaves in PIE. Connect is used to connect two entities over a shared buffer. Disconnect facilitates a disconnect between the two enclaves. The disconnect is split into a synchronous and a asynchronous event. The asynchronous disconnect only occurs when one of the entities unexpectedly dies and results in the transfer of the sole ownership of the memory region to the remaining enclave. This enclave can then try to continue its execution. However, it will realize that the other entity has died as it does not react to any activity on the shared memory region. At a later point, the untrusted OS can issue a synchronous disconnect to notify the enclave and free the shared memory officially. Note that the SM mandates a synchronous disconnect before another connect command. Due to this architecture, a stale shared buffer will never be made accessible to any untrusted entity until a synchronous disconnect occurs, during which the enclave will officially get notified. The separate handling of synchronous and asynchronous disconnect events enforces protection for any secret data during an enclave’s entire life cycle.
When the remote verifier attests to an enclave, he receives identifiers of all the connected enclaves. The SM generates these identifiers and makes sure that no two running enclaves share same identifier. Hence, an enclave could be assigned with an identifier that belonged to an enclave in the past. Of course, strictly increasing identifiers implemented with monotonic counters could be used for the identifier but such a solution needs a non-volatile storage on the CPU that is expensive.
Now assume that the adversary kills an enclave and launches a different enclave with the exact same identifier , i.e., she can kill enclave and launch (code(code()) with the same identifier (ID()ID()). However, when a remote verifier attests , he sees that the measurements mismatch as code(code() and rejects it.
Lets assume a more complex scenario with two pairs of enclaves: and , where but . A remote verifier attests to an enclave that is connected to and and establishes a shared secret with . Before the verifier attests to , she kills . The attacker then spawns a new enclave where ID()ID(). The remote verifier will then attest to and find that the code measurement looks fine. However, we stress that cannot be connected to because then would need to receive a synchronous disconnect and would need to be re-attested (due to the configuration of ). Now the attacker kills and replace with (where ID()ID()) and connect and . The verifier then sees that has the correct measurement and is connected to the identifier of (as ID()ID()). However, the verifier will want to provide its data to using the shared secret they have established in the previous attestation. Obviously, this cannot succeed as the new enclave cannot know the secret.
7. Implementation and Evaluation
In this section, we describe our prototype of PIE and its evaluation.
7.1.1. FPGA prototype
We implemented an end-to-end prototype of PIE that is based on the Keystone enclave framework (Lee et al., 2020). Figure 5 shows a case study on a platform that consists of an FPGA emulating the central processor connected to several Arduino boards that emulate specialized hardware.
We base our system on the Ariane core (Zaruba and Benini, 2019)
, an open-source RISC-V 64-bit core that supports commodity OS such as Linux. It is an RC64GC 6-stage application class core that has been taped out multiple times and can operate up to 1.5 GHz. We run this core on a Digilent Genesys 2 FPGA board (① in Figure 5).
We added PMP capability to the core that originally does not support PMP using 160 lines of SystemVerilog. The PMP unit is formally verified against a handwritten specification with yosys (Wolf, 2016). Two of these units are inserted into the memory management unit (MMU) and are responsible for checking data accesses and instruction fetches. An additional unit is placed in the hardware page table walker to check page table accesses. Our implementation has a configurable number of PMP entries up to the maximum number of 16 mandated by the standard (Waterman et al., 2019). Our modifications have been contributed to the Ariane project and are open source (Zaruba, 2020). Note that PMP is part of the RISC-V privilege standard and as such is already available on many other cores (Asanovic et al., 2016; Lowrisc, 2020).
We modified the SM to be able to connect two enclaves or an enclave and specialized hardware. Specifically, we added three new interfaces to the SM called connect, sync_disconnect, and async_disconnect. These interfaces can be used to set up shared regions between two enclaves or specialized hardware specified by their identifier. We also modified Keystone’s attestation procedure to include a list of identifiers for all connected enclaves. Our modifications only amount to 390 additional or modified lines of code. The SM consists of around 2000 lines of code excluding SHA3 and ed25519 implementations that contribute around 4000 additional lines of code.
In Keystone, every enclave runs on top of a minimal runtime that handles syscalls and manages virtual memory. Hence, its code is critical and part of the TCB. For our prototype, we added support to dynamically map shared memory regions into the virtual address space of an enclave. We modified 213 LoC out of a total of 3600 LoC for Keystone’s runtime.
On the untrusted OS side, there are many components to make it easier to create and run enclaves such as an SDK and a kernel driver. These components also required numerous changes. However, they are not trusted and as such do not increase the TCB.
Simple specialized hardware
In our prototype, we emulated the a number of simple specialized hardware (e.g., keyboard, mice, simple sensors, etc.) on the Arduino Due microcontroller prototyping board ( ② in Figure 5) using Arduino HID library. The Due’s GPIO pins are connected to the FPGA’s PMOD pins over two pairs of wires for bi-directioanl data. We modify the protocol to communicate data between the Due and the FPGA. The physical limitations of the PMOD pins restricts the channel’s frequency to MHz yielding 1 MB/s bandwidth. In the real world, the physical interfaces between the specialized hardware and the platform could be diverse such as USB, PCI-E, etc. As a concrete example, we implemented a keyboard with the Arduino board and wrote a simple keyboard driver that interprets the GPIO signal from the Arduino. Additionally, we use a PMOD interface-based seven-segment display unit as an output peripheral ( ③ in Figure 5). The driver contains around 50 LoC and is incorporated into our example controller enclave. Additionally, we use the USBHost library that can emulate a number of USB peripheral devices on the Arduino. We use the Arduino cryptographic library for signing the challenge messages from the controller enclave during the local attestation. The Due uses 128-bit AES (CTR mode) for encryption, HMAC_SHA256 for message authentication, Curve25519 for key exchange, and SHA3 for the hash function. We use DueFlashStorage library to implement the NVM flash that contains the key materials for the peripheral attestation. Our prototype implementation is approximately 2.5K lines of code.
We conduct another case study to show how complex specialized hardware such as a GPU-scale accelerator (Zaruba et al., 2020) can be extended to support PIE. The accelerator is a 4096-core RISC-V platform that has comparable performance to current state-of-the-art machine learning accelerators. It is organized in clusters each with 8 individual single-stage RISC-V cores (Zaruba et al., 2020), each of which is accompanied by a double precision floating point unit capable of two double precision and four single precision flops per cycle. To hide memory latency, all clusters have access to a scratchpad memory and a large L2 data cache.
To provide multi-tenant isolation on the accelerator, we introduce a shared PMP unit with 4 entries into every cluster. The PMP entries can only be configured by one out of eight cores but the access policies will be enforced on all of them. With this additional hardware support we were able to implement a small firmware that configures the PMP entries according to the specifications from the host and then runs a task in user mode. Upon a context switch, the scratchpad memory that was in use by the previous task must be flushed and the PMP entries must be reconfigured. The firmware consists of 143 lines of assembly and 73 lines of C code. This implementation and verification takes around 3 weeks.
We list PIE’s performance in the following categories:
1. Performance of enclave communication
Since PIE supports shared memory to communicate, its communication speed is the same as what the memory bus provides. This is much faster compared to traditional TEEs, where enclaves communicate through the OS requiring extra encryption steps. Concurrent work also demonstrates the performance gains that can be extracted from enclave to enclave communication using shared memory (Yu et al., 2020).
2. Context switches
Context switches are critical for any system and determine its responsiveness and a part of its performance. We performed experiments for various sizes of shared memory region and gathered various context switch latencies in Figure 6. We also measured the time of enclave creation which is mostly dominated by copying all the enclave data from the untrusted OS to the protected memory region and thus is expected to be linear in terms of shared memory size.
These measurements highlight that the context switches are independent on the shared memory size. The absolute context switch time increases from 4730 for stock Keystone to 4950 for PIE.
3. PMP overhead
We measure the hardware overhead of PMP units in terms of the logic, the caches, and the total amount in NAND2 gate equivalents within the processor pipeline for 0, 8, and 16 PMP entries, and present them in Table 1. We instantiate the Ariane core (Zaruba and Benini, 2019) with the default configuration: including the floating point unit, 32KiB L1 data cache, 16KiB L1 instruction cache, branch history table of size 64, and a 16-entry branch target buffer. We synthesized this instantiation of the core in a 22nm technology at 1GHz.
|0||472k GE||686k GE||1141k GE|
|8||497k GE||686k GE||1164k GE|
|16||531k GE||686k GE||1197k GE|
4. Simple specialized hardware
The communication overhead between the platform and the peripheral device emulated by the Arduino due is very small. At the time of initialization, the peripheral and the platform exchanges handshake messages to perform local attestation. The initial handshake message is bytes. Every message size of our modified protocol is 32 bytes. The combined latency introduced by signing averages around 60 s.
Our modification of the accelerator cores increases size by around 15% and slows down from 750MHz to 666MHz due to the impact of the PMP access checks on the critical path. Note that this may not reflect the general case but rather reflects an upperbound. The change in area of a single core complex (core, FPU, and an integer subsystem) can be found in Figure 7. In total, the area of the entire accelerator only increased by around 0.6%, with most of the area being occupied by the floating point units.
Supporting local physical attacker
The platform requires two hardware modifications in both the CPU and the specialized hardware for this. First, the CPU needs to support memory encryption and integrity, a typical mechanism that many TEEs already employ. Second, the communication channel between specialized hardware and the CPU, i.e., the bus, must provide confidentiality and integrity. Existing proposals from industry and research (Gueron, 2016; Kaplan et al., 2016; Suh et al., 2003)
report high performance for memory and bus encryption. In some cases, encrypting the bus is not performance sensitive, e.g., simple sensors only send very little data over the bus. Thus it might be possible to perform the encryption purely in software without any hardware modifications. Other specialized hardware such as accelerators that require high throughput probably need hardware encryption engines to accelerate their encrypted communication.
Limitation of the number of PMP entries
The number of PMP entries in the RISC-V privilege specification is limited to 16 (proposals for 64 entries are in discussion) due to the overhead of such entries on the CPU size. However, this limits the number of enclaves and shared memory regions that may coexist on a system. With one shared memory region per enclave, at most enclaves can exist at a time (16 entries support 7 enclaves). However, we stress that the isolation of enclaves can also be achieved using the memory management unit (MMU) in a similar fashion as Intel SGX (Costan and Devadas, 2016) or Sanctum (Costan et al., 2016)444There are efforts towards the hypervisor extension in RISC-V that would allow MMU based isolation without non-standard modifications, but as these are not ratified, they are hard to evaluate.. MMU-based isolation can also easily be extended to shared memory ranges and remove any limitation on the maximum number of enclaves.
Enhanced privacy mode
Using an end-to-end secure channel between application enclave and specialized hardware, we can enable an enhanced privacy mode into a platform-wide enclave. After the remote attestation of the platform-wide enclave, the remote verifier receives the attestation report of the individual components, including their public keys. Using these keys, the application enclave and the specialized hardware can establish a TLS session using the controller enclave as an untrusted transport layer. The developers need to enable this feature in the specialized hardware firmware. Moreover, cryptographic operations executed in software may result in lower performance. This enhanced privacy mode can work alongside the regular operation (trusted controller enclave).
9. Related Work
There exist a number of solutions of integrating external devices to widely-deployed TEEs. SGXIO (Weiser and Werner, 2017) aims to allow Intel SGX enclaves to interact with input-output devices under the remote adversary model. SGXIO uses a trusted hypervisor which virtualizes peripherals. SGXIO is static, i.e., all the peripherals have to be set up at boot time and no changes are allowed during runtime (connect new peripherals, etc.). It is not clear how enclaves are created and get access to a peripheral while preserving the confidentiality of previous enclaves that used said peripheral.
Graviton (Volos et al., 2018) is a TEE that enables isolated concurrent enclaves on a graphics card. Graviton would fit very well within a PIE as it is an excellent example of an enclave on specialized hardware and it shows that even some of the most powerful accelerators can be extended with a local TEE. Visor (Poddar et al., 2020) is based upon Graviton (Volos et al., 2018) and proposes a hybrid TEE that spans over both CPU and GPU. These enclaves communicate between themselves securely. Visor is aimed towards privacy-preserving video analytics where the computation pipeline is shared between the CPU (non-CNN workloads) and the GPU (CNN workloads) to increase efficiency.
ARM TrustZone is a system TEE provided by ARM for their system-on-chips (SoC) (Winter, 2008). TrustZone applications run on top of a secure OS that is trusted and isolated from the standard OS (also known as the rich OS). TrustZone only provides the lower level isolation property between the rich OS and the secure OS with an extra bit on the bus. Everything else, i.e., isolation between TrustZone applications, remote attestation, etc., has to be added to the secure OS (Ning, 2014). Due to this limitation, mobile phone manufacturers usually only allow TrustZone applications that are signed by them. Sanctuary (Brasser et al., 2019b) extends TrustZone with user-space enclaves. Sanctuary achieves isolation by running enclaves in their own address space in the normal world. However, Sanctuary does not extend to external specialized hardware.
Some of the proposals (Ying et al., 2018; Li et al., 2014; Lentz et al., 2018; Li et al., 2018) enable additional security properties such as a trusted path by enabling direct pairing of peripherals (e.g., the touchscreen) to the TrustZone application. However, these are only geared towards IO operation for trusted path and cannot be dynamically extend to support generic devices.
Other isolation methods
Minimal hypervisors or operating systems (Herder et al., 2006; Klein et al., 2009) can also achieve isolation, and some are even formally verified (Klein et al., 2009). Usually, such hypervisors do not include attestation, but the cost of adding that should be low. A PIE could also be based on a microkernel such as seL4. One would have to add an interface for the malicious OS running in a virtual machine to interact with enclaves that run directly on top of seL4, similar to other pure hypervisor-based isolation systems (Criswell et al., 2014; Chen et al., 2008; Hofmann et al., 2013; McCune et al., 2010; Ta-Min et al., 2006; Garfinkel et al., 2003). It might even be possible to formally prove such modifications to provide an even stronger assurance of isolation. The hypervisor is also in charge of the scheduling, resulting in a significantly bigger TCB than PIE. Moreover, in none of the hypervisor-based proposals, platform awareness, and platform-wide attestation are considered. Isolation is the sole objective of these proposals.
Bump in the wire-based solutions
Fidelius (Eskandarian et al., 2019), ProtectIOn (Dhar et al., 2020), IntegriScreen (Sluganovic et al., 2020), FPGA-based overlays (Brandon and Trimarchi, 2017), IntegriKey (Dhar et al., 2017) are some of the trusted path solutions that use external trusted hardware devices as intermediaries between the platform and IO devices. These external devices create a trusted path between a remote user and the peripheral and enable the user to exchange sensitive data securely with the peripheral in the presence of an attacker-controlled OS. Such solutions provide a loose notion of platform awareness and are focused on IO devices. Platform-wide attestation and strong isolation guarantee are out-of-scope of such proposals.
We introduce PIE, a secure platform design with a configurable hardware and software TCB. PIE allows to integrate specialized hardware into TEEs, something that before was not possible without violating TEEs’ adversary model. PIE provides two new security properties: platform-wide attestation and platform awareness. The former expands on the traditional notion of the attestation to provide a complete view of the platform’s state, and platform awareness provides mechanisms to the enclave to cope with the platform’s change, such as the disconnection of a specialized hardware. We present a prototype based on RISC-V Keystone, which shows that PIE is feasible and only adds LoC to the TCB.
- Abramson et al. (2006) Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajesh Sankaran, Ioannis Schoinas, Rich Uhlig, Balaji Vembu, and John Wiegert. 2006. Intel Virtualization Technology for Directed I/O. Intel technology journal 10, 3 (2006), 179–192.
- AMD (2007) AMD. 2007. AMD I/O Virtualization Technology(IOMMU) Specification. (2007).
- Asanovic et al. (2016) Krste Asanovic, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, David Biancolin, Christopher Celio, Henry Cook, Daniel Dabbelt, John Hauser, Adam Izraelevitz, et al. 2016. The rocket chip generator. Technical Report. University of California, Berkeley.
- Barham et al. (2003) Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. ACM SIGOPS operating systems review 37, 5 (2003), 164–177.
- Brandon and Trimarchi (2017) A. Brandon and M. Trimarchi. 2017. Trusted display and input using screen overlays. In 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, IEEE, 1–6.
- Brasser et al. (2019a) Ferdinand Brasser, Srdjan Capkun, Alexandra Dmitrienko, Tommaso Frassetto, Kari Kostiainen, and Ahmad-Reza Sadeghi. 2019a. DR.SGX: Automated and Adjustable Side-Channel Protection for SGX Using Data Location Randomization. In Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC ’19). Association for Computing Machinery, New York, NY, USA, 788–800.
- Brasser et al. (2019b) Ferdinand Brasser, David Gens, Patrick Jauernig, Ahmad-Reza Sadeghi, and Emmanuel Stapf. 2019b. SANCTUARY: ARMing TrustZone with User-space Enclaves.. In NDSS. NDSS.
- Brasser et al. (2017) Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kostiainen, Srdjan Capkun, and Ahmad-Reza Sadeghi. 2017. Software grand exposure: SGX cache attacks are practical. In 11th USENIX Workshop on Offensive Technologies (WOOT 17).
- Checkoway and Shacham (2013) Stephen Checkoway and Hovav Shacham. 2013. Iago attacks: why the system call API is a bad untrusted RPC interface. ACM SIGARCH Computer Architecture News 41, 1 (2013), 253–264.
- Chen et al. (2008) Xiaoxin Chen, Tal Garfinkel, E. Christopher Lewis, Pratap Subrahmanyam, Carl A. Waldspurger, Dan Boneh, Jeffrey Dwoskin, and Dan R.K. Ports. 2008. Overshadow: A Virtualization-Based Approach to Retrofitting Protection in Commodity Operating Systems. SIGOPS Oper. Syst. Rev. 42, 2 (March 2008), 2–13.
- Costan and Devadas (2016) Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. IACR Cryptology ePrint Archive 2016, 086 (2016), 1–118.
- Costan et al. (2016) Victor Costan, Ilia Lebedev, and Srinivas Devadas. 2016. Sanctum: Minimal hardware extensions for strong software isolation. In 25th USENIX Security Symposium (USENIX Security 16). 857–874.
- Criswell et al. (2014) John Criswell, Nathan Dautenhahn, and Vikram Adve. 2014. Virtual ghost: Protecting applications from hostile operating systems. In In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
- Dhar et al. (2020) Aritra Dhar, Enis Ulqinaku, Kari Kostiainen, and Srdjan Capkun. 2020. ProtectIOn: Root-of-Trust for IO in Compromised Platforms. In 27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. The Internet Society. https://www.ndss-symposium.org/ndss-paper/protection-root-of-trust-for-io-in-compromised-platforms/
- Dhar et al. (2017) Aritra Dhar, Der-Yeuan Yu, Kari Kostiainen, and Srdjan Capkun. 2017. IntegriKey: End-to-End Integrity Protection of User Input. IACR Cryptol. ePrint Arch. 2017 (2017), 1245.
- Eskandarian et al. (2019) Saba Eskandarian, Jonathan Cogan, Sawyer Birnbaum, Peh Chang Wei Brandon, Dillon Franke, Forest Fraser, Gaspar Garcia Jr., Eric Gong, Hung T. Nguyen, Taresh K. Sethi, Vishal Subbiah, Michael Backes, Giancarlo Pellegrino, and Dan Boneh. 2019. Fidelius: Protecting User Secrets from Compromised Browsers. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019. IEEE, 264–280.
- Garfinkel et al. (2003) Tal Garfinkel, Ben Pfaff, Jim Chow, Mendel Rosenblum, and Dan Boneh. 2003. Terra: A Virtual Machine-Based Platform for Trusted Computing. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP ’03). Association for Computing Machinery, New York, NY, USA, 193–206.
- Gruss et al. (2017) Daniel Gruss, Julian Lettner, Felix Schuster, Olya Ohrimenko, Istvan Haller, and Manuel Costa. 2017. Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory. In 26th USENIX Security Symposium (USENIX Security 17). USENIX Association, Vancouver, BC, 217–233. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/gruss
- Gueron (2016) Shay Gueron. 2016. Memory encryption for general-purpose processors. IEEE Security & Privacy 14, 6 (2016), 54–62.
- Herder et al. (2006) Jorrit N Herder, Herbert Bos, Ben Gras, Philip Homburg, and Andrew S Tanenbaum. 2006. MINIX 3: A highly reliable, self-repairing operating system. ACM SIGOPS Operating Systems Review 40, 3 (2006), 80–89.
- Hofmann et al. (2013) Owen S. Hofmann, Sangman Kim, Alan M. Dunn, Michael Z. Lee, and Emmett Witchel. 2013. InkTag: Secure Applications on an Untrusted Operating System. SIGPLAN Not. 48, 4 (March 2013), 265–278.
- Holdings (2013) ARM Holdings. 2013. ARM system memory management unit architecture specification—SMMU architecture version 2.0. (2013).
Jouppi et al. (2017)
Norman P. Jouppi, Cliff
Young, Nishant Patil, David Patterson,
Gaurav Agrawal, Raminder Bajwa,
Sarah Bates, Suresh Bhatia,
Nan Boden, Al Borchers,
Rick Boyle, Pierre-luc Cantin,
Clifford Chao, Chris Clark,
Jeremy Coriell, Mike Daley,
Matt Dau, Jeffrey Dean,
Ben Gelb, Tara Vazir Ghaemmaghami,
Rajendra Gottipati, William Gulland,
Robert Hagmann, C. Richard Ho,
Doug Hogberg, John Hu,
Robert Hundt, Dan Hurt,
Julian Ibarz, Aaron Jaffey,
Alek Jaworski, Alexander Kaplan,
Harshit Khaitan, Daniel Killebrew,
Andy Koch, Naveen Kumar,
Steve Lacy, James Laudon,
James Law, Diemthu Le,
Chris Leary, Zhuyuan Liu,
Kyle Lucke, Alan Lundin,
Gordon MacKean, Adriana Maggiore,
Maire Mahony, Kieran Miller,
Rahul Nagarajan, Ravi Narayanaswami,
Ray Ni, Kathy Nix,
Thomas Norrie, Mark Omernick,
Narayana Penukonda, Andy Phelps,
Jonathan Ross, Matt Ross,
Amir Salek, Emad Samadiani,
Chris Severn, Gregory Sizikov,
Matthew Snelham, Jed Souter,
Dan Steinberg, Andy Swing,
Mercedes Tan, Gregory Thorson,
Bo Tian, Horia Toma,
Erick Tuttle, Vijay Vasudevan,
Richard Walter, Walter Wang,
Eric Wilcox, and Doe Hyun Yoon.
In-Datacenter Performance Analysis of a Tensor Processing Unit.SIGARCH Comput. Archit. News 45, 2 (June 2017), 1–12.
- Juckeland et al. (2015) Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W. Hwu, Huian Li, Matthias S. Müller, Wolfgang E. Nagel, Maxim Perminov, Pavel Shelepugin, Kevin Skadron, John Stratton, Alexey Titov, Ke Wang, Matthijs van Waveren, Brian Whitney, Sandra Wienke, Rengan Xu, and Kalyan Kumaran. 2015. SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, Stephen A. Jarvis, Steven A. Wright, and Simon D. Hammond (Eds.). Springer International Publishing, 46–67.
- Kaplan et al. (2016) David Kaplan, Jeremy Powell, and Tom Woller. 2016. AMD memory encryption. White paper (2016).
- Katrinis et al. (2016) K. Katrinis, D. Syrivelis, D. Pnevmatikatos, G. Zervas, D. Theodoropoulos, I. Koutsopoulos, K. Hasharoni, D. Raho, C. Pinto, F. Espina, S. Lopez-Buedo, Q. Chen, M. Nemirovsky, D. Roca, H. Klos, and T. Berends. 2016. Rack-scale disaggregated cloud data centers: The dReDBox project vision. In 2016 Design, Automation Test in Europe Conference Exhibition (DATE). 690–695.
- Klein et al. (2009) Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, et al. 2009. seL4: Formal verification of an OS kernel. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 207–220.
- Lebedev et al. (2019) Ilia Lebedev, Kyle Hogan, Jules Drean, David Kohlbrenner, Dayeol Lee, Krste Asanović, Dawn Song, and Srinivas Devadas. 2019. Sanctorum: A lightweight security monitor for secure enclaves. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1142–1147.
- Lee et al. (2020) Dayeol Lee, David Kohlbrenner, Shweta Shinde, Krste Asanović, and Dawn Song. 2020. Keystone: An open framework for architecting trusted execution environments. In Proceedings of the Fifteenth European Conference on Computer Systems. 1–16.
- Lee et al. (2017) Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, and Marcus Peinado. 2017. Inferring fine-grained control flow inside SGX enclaves with branch shadowing. In 26th USENIX Security Symposium (USENIX Security 17). 557–574.
- Lentz et al. (2018) Matthew Lentz, Rijurekha Sen, Peter Druschel, and Bobby Bhattacharjee. 2018. SeCloak: ARM Trustzone-Based Mobile Peripheral Control. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys ’18). Association for Computing Machinery, New York, NY, USA, 1–13.
- Li et al. (2018) Wenhao Li, Shiyu Luo, Zhichuang Sun, Yubin Xia, Long Lu, Haibo Chen, Binyu Zang, and Haibing Guan. 2018. VButton: Practical Attestation of User-Driven Operations in Mobile Apps. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys ’18). Association for Computing Machinery, New York, NY, USA, 28–40.
- Li et al. (2014) Wenhao Li, Mingyang Ma, Jinchen Han, Yubin Xia, Binyu Zang, Cheng-Kang Chu, and Tieyan Li. 2014. Building Trusted Path on Untrusted Device Drivers for Mobile Devices. In Proceedings of 5th Asia-Pacific Workshop on Systems (APSys ’14). Association for Computing Machinery, New York, NY, USA, Article 8, 7 pages.
- Lowrisc (2020) Lowrisc. 2020. Ibex RISC-V Core. https://github.com/lowRISC/ibex. (2020).
- Ltd. (2021) Arm Ltd. 2021. Learn the Architecture: TrustZone for AArch64. https://developer.arm.com/architectures/learn-the-architecture/trustzone-for-aarch64/trustzone-in-the-processor. (2021).
- McCune et al. (2010) J. M. McCune, Y. Li, N. Qu, Z. Zhou, A. Datta, V. Gligor, and A. Perrig. 2010. TrustVisor: Efficient TCB Reduction and Attestation. In 2010 IEEE Symposium on Security and Privacy. 143–158.
- Naghibijouybari et al. (2018) Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian, and Nael Abu-Ghazaleh. 2018. Rendered insecure: GPU side channel attacks are practical. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2139–2153.
- Ning (2014) Peng Ning. 2014. Samsung knox and enterprise mobile security. In Proceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices. 1–1.
- Poddar et al. (2020) Rishabh Poddar, Ganesh Ananthanarayanan, Srinath Setty, Stavros Volos, and Raluca Ada Popa. 2020. Visor: Privacy-Preserving Video Analytics as a Cloud Service. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association.
- Ramesh et al. (2018) Chethan Ramesh, Shivukumar B Patil, Siva Nishok Dhanuskodi, George Provelengios, Sébastien Pillement, Daniel Holcomb, and Russell Tessier. 2018. FPGA side channel attacks without physical access. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 45–52.
- sifive (2019) sifive. 2019. RISC-V Security Architecture Introduction. https://sifive-china.oss-cn-zhangjiakou.aliyuncs.com/%E8%A5%BF%E5%AE%89%E7%8F%A0%E6%B5%B7%E6%9D%AD%E5%B7%9E%E5%90%88%E8%82%A5ppt/04%20hujin%20RISC-V%20Security%20Architecture%20Introduction_4%20City.pdf. (2019).
- Sluganovic et al. (2020) Ivo Sluganovic, Enis Ulqinaku, Aritra Dhar, Daniele Lain, Srdjan Capkun, and Ivan Martinovic. 2020. IntegriScreen: Visually Supervising Remote User Interactions on Compromised Clients. (2020). arXiv:cs.CR/2011.13979
- Suh et al. (2003) G Edward Suh, Dwaine Clarke, Blaise Gassend, Marten Van Dijk, and Srinivas Devadas. 2003. AEGIS: architecture for tamper-evident and tamper-resistant processing. In ACM International Conference on Supercomputing 25th Anniversary Volume. 357–368.
- Suzaki et al. (2011) Kuniyasu Suzaki, Kengo Iijima, Toshiki Yagi, and Cyrille Artho. 2011. Memory deduplication as a threat to the guest OS. In Proceedings of the Fourth European Workshop on System Security. 1–6.
- Ta-Min et al. (2006) Richard Ta-Min, Lionel Litty, and David Lie. 2006. Splitting Interfaces: Making Trust between Applications and Operating Systems Configurable. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (osdi ’06). USENIX Association, USA, 279–292.
- Torvalds et al. (2020) Linux Torvalds et al. 2020. Linux kernel source tree. Git Respository (2020).
- Volos et al. (2018) Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. 2018. Graviton: Trusted execution environments on GPUs. In 13th USENIX Symposium on Operating Systems Design and Implementation (osdi 18). 681–696.
- Waterman et al. (2019) Andrew Waterman, Yunsup Lee, Rimas Avizienis, David A Patterson, and Krste Asanović. 2019. The RISC-V instruction set manual volume II: Privileged architecture. EECS Department, University of California, Berkeley (2019).
- Weiser and Werner (2017) Samuel Weiser and Mario Werner. 2017. Sgxio: Generic trusted i/o path for intel sgx. In Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. 261–268.
- Werner et al. (2019) Mario Werner, Thomas Unterluggauer, Lukas Giner, Michael Schwarz, Daniel Gruss, and Stefan Mangard. 2019. Scattercache: Thwarting cache attacks via cache set randomization. In 28th USENIX Security Symposium (USENIX Security 19). 675–692.
- Winter (2008) Johannes Winter. 2008. Trusted computing building blocks for embedded linux-based ARM trustzone platforms. In Proceedings of the 3rd ACM workshop on Scalable trusted computing. 21–30.
- Wolf (2016) Clifford Wolf. 2016. Yosys open synthesis suite. (2016).
- Xu et al. (2015) Yuanzhong Xu, Weidong Cui, and Marcus Peinado. 2015. Controlled-channel attacks: Deterministic side channels for untrusted operating systems. In 2015 IEEE Symposium on Security and Privacy. IEEE, 640–656.
- Ying et al. (2018) Kailiang Ying, Amit Ahlawat, Bilal Alsharifi, Yuexin Jiang, Priyank Thavai, and Wenliang Du. 2018. TruZ-Droid: Integrating TrustZone with Mobile Operating System. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys ’18). Association for Computing Machinery, New York, NY, USA, 14–27.
- Yu et al. (2020) Zhijingcheng Yu, Shweta Shinde, Trevor E Carlson, and Prateek Saxena. 2020. Elasticlave: An Efficient Memory Model for Enclaves. arXiv preprint arXiv:2010.08440 (2020).
- Zaruba (2020) Florian Zaruba. 2020. Ariane RISC-V CPU. https://github.com/openhwgroup/cva6. (2020).
- Zaruba and Benini (2019) Florian Zaruba and Luca Benini. 2019. The cost of application-class processing: Energy and performance analysis of a linux-ready 1.7-ghz 64-bit risc-v core in 22-nm fdsoi technology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27, 11 (2019), 2629–2640.
- Zaruba et al. (2020) Florian Zaruba, Fabian Schuiki, and Luca Benini. 2020. A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1–24.
- Zaruba et al. (2020) F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini. 2020. Snitch: A tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads. IEEE Trans. Comput. (2020), 1–1. https://doi.org/10.1109/TC.2020.3027900
- Zhang et al. (2009) Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European conference on Computer systems. 89–102.
- Zhao et al. (2020) Jerry Zhao, Ben Korpan, Abraham Gonzalez, and Krste Asanovic. 2020. SonicBOOM: The 3rd Generation Berkeley Out-of-Order Machine. (May 2020).