SpectreRewind: A Framework for Leaking Secrets to Past Instructions

by   Jacob Fustos, et al.

Transient execution attacks,such as Spectre and Meltdown, utilize micro-architectural covert channels to leak secrets that should not have been accessible during logical program execution. Commonly used micro-architectural covert channels in such attacks are those that leave lasting footprints in the micro-architectural state, for example, a cache state change. This lasting footprint has led attackers to utilize an attack framework where secrets are transmitted into covert channel during transient execution and later, after transient execution is complete, read secret from covert channel. This has led to the proposal of high performance hardware defenses that track potential secret data during transient execution and either discard or revert micro-architectural changes once transient execution has completed. In this work, we create a new framework for transient execution attacks that we call SpectreRewind. Our framework allows the attacker to both transmit and receive secret before transient execution has completed, bypassing defenses that try to revert changes caused by the attack. Unlike similar techniques utilizing hyper-threading, SpectreRewind is designed to be performed on a single hardware thread making it viable on systems where attacker cannot utilize SMT. We accomplish this by reading from covert channel with instructions that come logically before the transient execution in program order. Using our framework, we are even able to utilize simultaneous covert channels from a single hardware thread and show this by creating a channel that utilizes contention on the floating point divisional unit of modern commodity processors.



There are no comments yet.


page 3

page 4

page 5


Survey of Transient Execution Attacks

Transient execution attacks, also called speculative execution attacks, ...

DDM: A Demand-based Dynamic Mitigation for SMT Transient Channels

Different from the traditional software vulnerability, the microarchitec...

A Systematic Evaluation of Transient Execution Attacks and Defenses

Modern processor optimizations such as branch prediction and out-of-orde...

Leaking Secrets through Modern Branch Predictor in the Speculative World

Transient execution attacks that exploit speculation have raised signifi...

SMoTherSpectre: exploiting speculative execution through port contention

Spectre, Meltdown, and related attacks have demonstrated that kernels, h...

SpecBox: A Label-Based Transparent Speculation Scheme Against Transient Execution Attacks

Speculative execution techniques have been a cornerstone of modern proce...

An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors

Hardware (HW) security issues have been emerging at an alarming rate in ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern out-of-order microprocessors support speculative execution to improve performance. In speculative execution, instructions can be executed speculatively before knowing whether they are in the correct program execution path. If the speculation was wrong, the instructions that were executed incorrectly—known as transient instructions [13]—are squashed and the processor then simply retries to fetch and execute the correct instruction stream. Unfortunately, it turned out that these transient instructions can potentially bypass both software and hardware defenses to access secrets. The disclosure of the Spectre [13] and Meltdown [15] and many other subsequently disclosed transient execution attacks [12, 16, 14, 9, 5, 10, 20, 23, 27, 24, 17, 19, 6] have shown the danger of these transient instructions, as the secrets they had access to could be encoded and transmitted into microarchitectural covert channels, from which normal, non-speculative instructions could then read, allowing the secrets to be visible to the attacker.

All known transient execution attacks share the same three basic steps: (1) the attacker initiates speculative execution where the secret is read improperly from memory or registers; (2) the secret dependent transient instructions then encode and transmit the secret to a micro-architectural covert channel; (3) finally, the secret is recovered from the covert channel by normal (non-transient) receiver instructions. The use of well known micro-architectural covert channels that leave lasting footprints in micro-architectural state, such as the cache state, allowed attackers to follow a standard framework to these attacks, where the final step of reading from the covert channel could be accomplished after transient execution had completed. This framework led to the proposal of hardware defenses that track micro-architectural state changes, either by storing the changes caused by transient instructions into so called shadow buffers [11, 28, 8] and then only updating the standard microarchitectural structures when the instructions become architecturally visible, or by tracking the microarchitectural changes and reverting [18] them when a transient instruction is squashed. Such a mitigation strategy is attractive from a performance standpoint, as the transient instructions are allowed to execute normally, retaining many of the performance benefits of speculative execution.

These type of defenses are effective at blocking attacks that utilize the previously stated framework, as the attacker does not read from covert channel until after transient execution has completed. Unfortunately, these techniques cannot be used to block attacks that both transmit into and read from covert channel before transient instructions have been squashed. SmotherSpectre [3] is the first to demonstrate such an attack, by utilizing a simultaneous covert channel in a simultaneous multi-threading (SMT) setup where contention on issue ports within the processor was used as a covert channel to transmit secret between the hardware threads in the context of Spectre-based attack. Such contention cannot be buffered or reverted, as instructions have already waited to use the issue ports, affecting their execution time.

In this paper, we create a new framework for performing transient execution attacks, which we call SpectreRewind. Much like SmotherSpectre, SpectreRewind allows the attacker to both transmit and receive secret data before transient execution has completed, allowing the attacker to bypass most defense mechanisms that attempt to revert or hide micro-architectural changes caused by the attack. However, unlike SmotherSpectre, SpectreRewind does not require the attacker to utilize SMT, instead the attack can be executed from a single hardware thread. While traditional transient attacks locate the instructions that will read from the covert channel logically after the instruction that triggers the transient execution (e.g., a branch), SpectreRewind takes the opposite approach and locates these instructions logically before the triggering instruction. This structure allows the transmitting and receiving instructions to execute concurrently on a modern out-of-order core and communicate the secret even before the transient execution completes.

We start by presenting our framework and examining the unique challenges associated with it (Section  4). We then apply our framework to create a covert channel, which utilizes contention on a floating point division unit in commodity Intel and AMD processors (Section  5). Next, we analyze how SpectreRewind can be integrated into different transient execution attacks (Section  6). Finally, we discuss the security implications that this attack has on currently proposed and implemented hardware and software defenses (Section  7). Our evaluation results show that our technique allows the creation of a simultaneous covert channel from a single threaded context that has low noise and is viable for use in transient execution attacks, and can leak data at a rate of 1.16-1.98 KB/s with a low error rate of 0.003% across a variety of modern commodity processors.

2 Background

In this section, we provide necessary background on out-of-order cores, transient execution attacks, and simultaneous multithreading (SMT) hardware.

2.1 Out-of-order Processors

Figure 1: Simplified processor out-of-order execution section. The ReOrder Buffer holds and retires ops in logical program order, while ops are issued to the execution units in out-of-order.

Modern high performance microprocessors utilize out-of-order execution to execute multiple independent instructions in parallel—taking advantage of instruction level parallelism—allowing for higher throughput, while also reducing the penalty of a stall caused by independent instructions.

Figure 1 shows a simplified example of an out-of-order processor. In this example, instructions have first been translated into micro-operations— ops. These ops are first placed into the ReOrder Buffer (ROB) in logical program order. They are then passed to the scheduler where—once their operands become available and the necessary resources are available—they are then issued to a proper functional unit. In this example, the functional units are clustered into two execution units. Each execution unit contains a single issue port, which can only issue a single op to one of the enclosed functional units every clock cycle, but once issued, the functional units run independent of each other. Once executed by a functional unit, the scheduler is notified so that it can forward the results to following dependent ops. The op then waits in the ROB until it reaches the head where it may be retired. It is only now that the changes made by the op become architecturally visible, giving the illusion—from the architecture’s point of view—that the instructions are executed in-order.

To further reduce branch related stalls, modern processors implement complex branch predictors to predict what instructions should be executed. As the results of execution are not made architecturally visible until the instructions retire, it is architecturally safe to create a checkpoint of the processor’s state before the prediction and then store the predicted instruction stream in the ROB and execute the instructions—speculative execution—further improving performance if the prediction was correct. If the prediction was false, these instructions are squashed—returning the processor state to the checkpoint—where the processor can begin executing the correct instructions. Instructions that were executed but were then squashed—will never become architecturally visible—are known as transient instructions.

2.2 Transient execution attacks

As transient instructions were not supposed to have executed, it follows that they can perform tasks—such as accessing secret data—that should not have been accessible during proper program execution. While they do not retire—and do not become architecturally visible—they still can contend for shared resources with instructions that will retire, creating a microarchitectural side-channel that can leak the secrets.

       if (x < array1_size)
               secret = array1[x];
               y = array2[secret * 4096];
Figure 2: A Spectre gadget. Adopted from [13]

Figure 2 shows an example of a speculative execution attack—Spectre variant 1 [13]—in this example, the if statement of line 1 has been trained by the attacker such that the body—lines 2 and 3— is executed, even though x is out-of-bounds—transient execution. This allows the attacker to chose the value of x such that it points to the value of the secret and when line 2 is executed, the value of the secret—something that would have been blocked during normal program flow—will be unintentionally loaded from memory. The secret value is then encoded in line 3 to leak its value into the cache state. Later—after the transient instructions have been squashed—the attacker can read the cache state—from architecturally visible instructions—and decode it to complete the side-channel.

Transient execution attacks can be broken down into three categories. Spectre type attacks utilize control- and data-flow mis-speculation to force a victim to access secrets from their own address space and leak them into the covert channel where they can be accessed by the attacker. Each Spectre variant—1 [13], 1.1 [12], 2 [13], 4 [9], and ret2spec [16, 14]—is distinguished by the microarchitectural component that is responsible for causing the mis-speculation namely—Branch History Buffer (BHB), Branch Target Buffer (BTB), Memory Disambiguator, Return Stack Buffer (RSB). Meltdown style attacks take advantage that processor exceptions are deferred until the instruction that caused the exception is retired—becomes architecturally visible—instructions that occur logically after the exception should not execute in regards to logical program order—but can be executed out-of-order—potentially bypassing the security that the exception intended to provide. These attacks can be run from within the attacker’s own address space while allowing them to access secrets from other processes and privilege levels. Each Meltdown variants—1.2 [12], 3 [15], 3a [5, 10], Lazy FP [20], and L1TF [23, 27]—correspond to the exception that caused the fault. Finally, Microarchitectural Data Sampling(MDS) [17, 24, 19] is a form of transient execution attacks—technically Meltdown-type—that target speculative loads that have incorrectly loaded data from internal buffers—Store Buffer, Load Port, Line Fill Buffer—and leak the data into covert channels before realizing the fault. The data that was incorrectly loaded could have come from other SMT threads on the same processor executing at any privilege level.

2.3 Simultaneous Multithreading (SMT)

To improve hardware utilization, manufacturers often employ a technique called Simultaneous Multithreading (SMT) [22], where a single core is allowed to execute multiple hardware threads—instruction streams—simultaneously. These hardware threads share the underutilized structures, improving utilization, while appearing—from the architectural point of view—to be independent processing cores. Because the hardware threads in a SMT capable core share various hardware structures—e.g. issue ports and functional units—these structures can be used to create covert channels and microarchitectural side channels between processes running on the multiple threads within a single core.

3 Threat Model

    // attacker controlled code
    secret = transient_execution_attack();
    // attacker controlled code
Figure 3: Attacker controlled code located around the transient execution attack

We assume that there exists a secret that the attacker would like to leak either through one of the currently known transient execution attacks listed in Section 2.2 or through a yet to be discovered transient execution attack. We do not make assumptions about the location of the secret, only that it can be targeted by the chosen attack. We assume that the attacker can successfully train and execute the transient execution attack in a situation as shown in Figure 3, where the attacker has the ability to control some code that executes both logically before and logically after the transient execution attack in program order. We assume that the attacker would like to construct code that will transmit the secret over a covert channel such that they may read the value of the secret at the architectural level. We assume that the system is defended by proposed hardware mitigation techniques [8, 11, 28, 18] and that these defenses are successfully removing the secret from the covert channel once the transient execution phase of the attack completes. We assume that the attacker cannot successfully either create or coordinate with other attacker-controlled threads or processes that are running on the same system. We also assume that the program that the secret resides in is correct—doesn’t contain any other vulnerabilities either from architectural or micro-architectural perspective that could allow attacker to leak secret.

4 SpectreRewind Framework

In this section, we present the SpectreRewind framework.

SpectreRewind is an attack framework that allows the attacker to both transmit into and receive from a covert channel before the transient execution phase of the attack is completed.

Figure 4: Simplified timing diagram comparing traditional Spectre attack framework to SpectreRewind framework

Figure 4 illustrates the basic concept of SpectreRewind and compares it to that of a typical transient execution attack framework. In this figure, we break up involved ops into three distinct categories: the ops that come logically before, during, and after the transient execution attack. In both frameworks, we assume that we send only a single bit over a covert channel at a time. For each framework, we depict two timing diagrams: transmitting ‘0’ and ‘1’ over the covert channel.

In case of the traditional transient execution attack framework, the attacker will use a covert channel that causes a lasting state change in the micro-architecture, and read from the covert channel from ops that occur logically after the transient execution. Data can read from the channel by measuring the timing differences of these op (t3-t4 for a value ‘0’ and t3-t5 for a value ‘1’). Hardware defenses (e.g.,  [28, 11]) that remove secret from covert channel after transient execution (t2) will be able to stop this attack by disrupting the transmission of the secret.

In case of the SpectreRewind framework, however, transient instructions will contend for resources with the ops that come logically before the transient instructions. Because the covert channel will be read from before transient execution completes (t2), the aforementioned hardware defense mechanisms which attempt to remove the secret from the covert channel at that time will be ineffective. In our approach, the attacker measures the entire execution time of the attack to detect the timing differences. Since the covert channel must be read from before transient execution completes, this gives the added challenge of needing to fit the entire attack in the ROB at the same time.

SpectreRewind assumes that older transient ops can contend with younger ops that began before the transient ops on certain micro-architectural resources. In the following, we will discuss the kinds of micro-architectural resources are viable covert channels in SpectreRewind.

4.1 Non-Pipelined Functional Units

(a) Ready victim, Pieplined functional unit
(b) Waiting victim, Pipelined functional unit
(c) Waiting victim, Non-pipelined functional unit
Figure 5: Multiple attempts by attacker to delay the execution of the victim, causing measurable timing differences. If the attacker is younger than the victim, an age-ordered scheduler will prevent most contention.

Since we are going to attempt to contend with instructions that are logically older than us this will limit our covert channel options. We will not be able to cause port contention or contention on fully pipelined functional units (as demonstrated in SMT attacks). This restriction happens on age-ordered schedulers and is described in this section, but we also show that it is possible to cause contention on functional units that contain at least one non-pipelined stage.

Figure 5 shows visual examples of this problem. In Figure 4(a), we see an example of an attacker op trying to cause slowdown on a victim op that is trying to use a shared integer multiplier. Unfortunately, because both the attacker and victim are ready to issue, the scheduler will choose the older victim, preventing any contention.

Figure 4(b), shows the situation where the attacker becomes ready the cycle before the victim. The attacker is issued into the multiplier, but still cannot create contention on the victim, as the victim is issued on the next cycle that it becomes ready, just as if the attacker was not there.

Finally, figure 4(c) shows an attack on a non-pipelined shared functional unit (stage 1 takes 3 clock cycles to complete). As the victim is not initially ready, the attacker is scheduled on the unit. As the unit is not pipelined, the victim cannot be issued on the unit until the attacker completes, which effects the execution time of the victim, making a covert channel possible. Thus, for our attack we will only focus on functional units that have at least one stage that is not fully pipelined.

5 Floating Point Covert Channel

In this section, we utilize our SpectreRewind framework to create a covert channel on real commodity hardware that can transmit data from transient execution without using footprint covert channels, or SMT co-scheduled processes. We do this by causing contention on the floating point division unit.

1    double probe, div;
2    double send1, send2, …, send24;
3    int message; // secret
5    start = rdtscp(); // start timer
7    // begin receiver (12 dependent FP divisions)
8    probe /= div;
9    probe /= div;
11    probe /= div;
12    // end of receiver
14    if (probe == 1) { // begin speculative execution
15        m_bit = bit(message, k);
16        if (m_bit) { // secret dependent branch
17            // begin sender (24 independent FP divisions)
18            send1 /= div;
19            send2 /= div;
21            send24 /= div;
22            // end of sender
23        }
24    }
26    end = rdtscp(); // end timer
Figure 6: Pseudo code of our floating point division unit contention based covert channel.

Our covert channel utilizes contention on a functional unit, namely the floating point division unit (see Figure 1), to transmit data from transient instructions to non-transient instructions, which will retire and become architecturally visible. The floating point division unit was chosen as it is not fully pipelined (see Section 4.1) in all Intel and AMD microarchitectures we tested. Table 1 shows the tested microarchitectures, and their latency and throughput characteristics of the DIVSD instruction, which are obtained from [1] 111As defined in [1], latency refers to the clock cycles needed from the time the op is issued to the time the result become available to dependent ops, while throughput refers to the clock cycles needed from the time the op is issued until to the time the functional unit becomes available again. Note that in all tested microarchitectures, the throughput of the DIVSD instruction is 4 or 8 cycles, meaning that while an DIVSD instruction is being executed, a pending DIVSD instruction has to wait 4 or 8 cycles before entering the floating point division unit. This delay makes the floating point division unit an ideal candidate for us to create a covert channel.

Figure 6 shows the code used to form the ideal covert channel. (1) A timer is started (Line 5); (2) A chain of dependent floating point division instructions begin execution (Line 8). Because the instructions are dependent, each instruction suffers the full round-trip latency of the floating point division unit (see Table 1). This chain of division instructions acts as a receiver; (3) The result of the receiver instruction chain is compared in the if statement (Line 14). The if statement has been previously trained to be true, so the body will execute speculatively while the result of the receiver chain is being calculated; (4) A single bit of the (secret) message to transmit is calculated (Line 15) and the inner if statement branches depending on the value of the single bit (Line 16); (5) The inner if statement is trained to be false. Thus, if the secret bit was ‘1’, the processor backtracks and begins to speculatively execute a set of independent floating point division instructions (Line 18-21), which act as a sender. The “sender” instructions are independent with each other so as to be issued concurrently and maximally contend with the “receiver” instructions on the floating point division unit of the processor. (6) When the “receiver” instructions are completed, the processor will realize the mis-speculation (the probe in Line 14 was 0) and squash the speculative instructions from the “sender”. We then stop the timer (Line 26) and measure the time difference.

Note that if the secret bit was ‘1’, the observed time difference will be longer, due to the contention in the floating point division unit with the mis-speculated “sender” instructions, compared to the case when the secret bit was ‘0’ where there was no contention. This secret-dependent difference creates a covert channel.

5.1 Covert Channel Properties

CPU Microarch. Latency Throughput Transfer Rate Error Rate
(cycles) (cycles) (KB/s) (%)
Intel(R) Core(TM) i5-8250U Kaby Lake R 13–15 4 2.347 < 0.001
Intel(R) Xeon(R) CPU E5-2650 v4 Broadwell 10–14 4 1.697 0
Intel(R) Xeon(R) CPU E5-2658 v3 Haswell 10–20 8 1.988 0
AMD Ryzen 3 2200G Zen 8–13 4 1.163 0.003
Table 1: Evaluation platforms; DIVSD (SSE floating point division) instruction latency and throughput [1]; performance (transfer/error rates) of the proposed floating point unit covert channel.
(a) Kaby Lake
(b) Broadwell
(c) Haswell
(d) Zen
Figure 7: Floating point division unit based covert channel (Figure 6) timing characteristics; 3,000,000 timing measurement samples of transmitting 0 (purple) and 1 (green).

We utilized the code from the previous section and experimentally evaluate the characteristics the covert channel on four commodity Intel and AMD systems, as listed in Table 1. Each system runs Linux (Ubuntu 16.04), and rdtscp instructions were used for cycle accurate timing measurements. We repeatedly send 0 and 1 values over the covert channel, each for 3,000,000 times, and measure the timing results. We disable Turbo-boost to improve reliability of the measurements.

Figure 7

shows the timing differences of sending a ‘0’ and a ‘1’ over the covert channel. These graphs were created by taking 3 million timing samples each for sending values ‘0’ and ‘1’ over the channel. The X-axis displays the number of cycles taken to transmit, while the Y axis displays the probability a measurement has to take that many cycles. For each microarchitecture, we can detect noticeable timing differences between ‘0’ and ‘1’ values with relatively low amounts of noise.

5.2 Sending Data Over the Covert Channel

Finally, we utilize the code in Figure 6 to test the transfer rate of the covert channel. Due to the timing variations in the implementations of the floating point division units it is difficult to create a program that works across all of the platforms. To overcome this challenge, we first allow the attacker to send a known ‘0’ and ‘1’ value across the channel to calibrate thresholds. We do this for each bit sent, drastically decreasing the throughput of the channel, but also decreasing the error rate and allowing the attack to be portable. The next problem can be seen in Figure 6(d) where the values of ‘0’ and ‘1’ overlap. To overcome this, we require the transmitter to send the same bit over the channel multiple times, until either the same bit value has been received 6 times in a row, or after 6 transmissions one of the values has been received twice as much as the other. We create a randomly generated string of bits that is 2MB long, and send it over the channel. The results are shown in Table 1 (see the ‘Transfer Rate’ and ‘Error Rate’ columns). Here we see that each of the microarchitectures can transmit over the channel at rates > 1KB/s with no or negligible error rates.

6 Transient Execution Attacks using SpectreRewind

In this section, we explore the viability of using SpectreRewind in different transient execution attack scenarios. Figure 8 shows an code snippet of Spectre variant 1 attack using the floating point unit covert channel (adopted from the code in Figure 6). Note that other transient attacks can similarly modified to utilize the proposed covert channel.

1    if (probe == 1) { // begin speculative execution
2        if (x < array1_size) {
3            secret = array1[x];
4            m_bit = bit(secret, k);
5            if (m_bit) { // secret dependent branch
6                // begin sender
7                send1 /= div;
8                send2 /= div;
10                send24 /= div;
11                // end of sender
12            }
13        }
14    }
Figure 8: Modified Spectre variant 1 attack using the floating point covert channel (based on the code in Figure 6)

6.1 Native Code Attack

In the first scenario that we will consider, the attacker has complete control to select the assembly instructions that are executed during the attack. This includes a malicious user on a system that can craft binaries that contains a transient execution attack, and then run that binary attempting to direct access secret either from kernel or belonging to other user process on same system. This scenario is consistent with Meltdown and MDS type attacks. Spectre attacks though, only allow the attacker to read secrets from the same address space. If the attacker can choose what assembly instructions to execute, then the attacker could just access the secret directly, making this scenario not dangerous for Spectre attacks. Recently though, ExSpectre [25] provided an example where this was not the case by utilizing Spectre variant 2 gadgets—in a native code scenario—to hide the malicious intent of a software Trojan from security scanning algorithms—showing that Spectre type attacks are dangerous in native code execution scenarios. We have tested our framework in native code—integrated with a Spectre variant 1 vulnerability—and were able to leak data at similar speeds as the ideal covert channel presented in the previous section.

6.2 Sandboxed Code Attack

In this scenario, the attacker has the ability to execute code on the target machine, but is restricted to operations allowed by the sandbox—e.g. JavaScript. In this scenario, while the attack may still attempt to utilize Meltdown and MDS type attacks to cross security domains, the attacker may even target secrets within the same address space—Spectre type attacks—that they cannot normally access due to limitations in the sandbox environment. The attacker will again simply need to add the transient execution attack to the provided framework, but in this scenario, creating the framework in the sandboxed environment may prove to be challenging. Traditional transient execution attacks allow that the transmission into the covert channel, and the subsequent read from the covert channel happen at different times. On the other hand, our framework requires the entire attack—both transmit and receive of the secret—to fit into the ROB simultaneously. While the ROB can be large 192-224 ops on common high performance microarchitectures, a single division instruction could potentially translate into many instructions in the sandboxed environment, quickly filling up the ROB and preventing the attack from fitting. Thus, it will depend on the implementation of the sandbox if the covert channel can be utilized to leak secrets.

6.3 Cross Process/Context Switch Attack

Many dangerous versions of Spectre—Variant 2 and RSB—allow the attacker to identify vulnerable code in a victim process—or the operating system—that simply shares time on the same processor as the attacker process. The attacker process first trains the predictor units of the processor, such that when the victim is later scheduled on the processor, they erroneously jump into code that causes them to access and leak secrets from their own address space into a covert channel. The attacker is then later rescheduled on the same core, reading the secret from the covert channel. SpectreRewind cannot be used in such an attack scenario, as we require the covert channel to be both written to and read from simultaneously.

7 Security Evaluation

SpectreRewind is not a new transient execution attack. It is only a framework that allows the attacker to create covert channels that transmit the secret before the transient execution has been completed. This technique allows for the creation of a covert channel that uses contention on shared functional units during transient execution attacks, but to our knowledge, it does not hold any advantages over past use of functional units as a covert channel in other scenarios. Our approach does not circumvent the security of currently used software solutions, or commodity hardware that is already immune to vulnerabilities such as Meltdown or variants of Spectre, as these defenses hold regardless of when the secret is transferred over the covert channel. For this reason, we only focus on the security concerns relating to recently proposed hardware defense mechanisms.

7.1 Existing Hardware Defense Mechanisms

InvisiSpec [28] and SafeSpec [11] are both recently proposed hardware solutions that defer updating microarchitectural states of caches and TLBs until such changes are considered to be safe. Until that point, microarchitectural changes are instead stored within additional hardware structures, known as “shadow buffers”, and are simply discarded if the instruction that would cause the change is squashed. While these two approaches were implemented on hardware simulators, more recent work by Gonzalez et al [8]

actually implemented such a defense on an out-of-order open source RISC-V processor core—again only protecting the data cache—resulting in very low overhead. Recently, CleanupSpec 

[18] also achieved high performance, by instead letting microarchitectural changes from transient instructions occur in the microarchitecture and later—if the instruction was squashed—undoing those changes.

SpectreRewind can bypass the security provided by these defense mechanisms, as complete transmission of secret over the covert channel is accomplished—in forms of increased execution time of the receiver—before the transient instructions are completed, which make the aforementioned defense mechanisms ineffective.

7.2 Mitigation Strategies

We now provide several mitigation strategies to mitigate SpectreRewind based attacks.

Covert channels utilizing concurrent contention functional units, like the floating point division unit, which we introduced in Section 5, can be mitigated by fully pipelining or strictly enforcing the program order in executing instructions on the considered functional unit. Such a strategy would need to be employed in buffers and non-pipelined units within the limited resources of the memory system as well (e.g. MSHRs). Functional units that were unable to be redesigned in a fully pipelined fashion could also implement a scheduler that would block instructions bound for the functional unit until all other instructions bound for the same unit had been issued, basically making those functional units in-order while maintaining the out-of-order nature of the rest of the processor. Finally, functional units could be made preemptive, allowing older instructions to preempt younger instructions that are currently using the units. The scheduler would need to be informed of this, and be redesigned to re-issue these instructions when the unit became available again. However, such changes may not always be possible due to various design constraints.

8 Related Work

Cache based covert channels generally utilize the property that timing differences occur when accessing different levels of the cache, and these timing difference can be large—e.g. 1s of nanoseconds to access the L1 cache vs 100s of nanoseconds to access main memory on common high performance systems. Changes to the cache can also be long lasting (remain across context switches) as generally cache state changes remain until they are replaced by other memory accesses. Prime+Probe [21] takes advantage of the fact that caches are generally partitioned into sets that can hold a limited number of cache lines. Once the attacker finds a grouping that perfectly fills the set, measuring the access time of accessing the entire set allows the attacker to monitor other memory access across the system that are utilizing that same set which can be used as a side-channel to spy on a victim. Flush+Reload [29] is a cache based technique that uses special hardware instructions provided by architectures to flush cache lines from the caches. If the attacker shares memory with the victim, they need only flush a shared memory location, let the victim execute, and then time a reload to the memory location to determine if it was accessed by the attacker. Both techniques are commonly used in transient execution attacks to monitor cache activity performed by the transient instructions.

Systems that implement simultaneous multi-threading (SMT), are particularly open to the creation of covert channels, as the multiple threads that run on a single core can potential share and thus compete for the resources on the core. Wang and Lee explored functional unit sharing in SMT processors to create a covert channel [26]. In their work, they created a covert channel—on a Pentium 4 processor—by utilizing contention on the shared integer multiplication unit. Concurrently, Acıiçmez and Seifert utilized contention on the same Intel processor—again using the shared integer multiplication unit—to create a microarchitectural side channel [2]. Utilizing this channel, the attacker could spy on another process running a square and multiply cryptographic function that was running concurrently on a separate hardware thread on the same core. Our work differs from these papers, as we explore the unique challenges of both creating similar functional unit covert channels, but from a non-SMT context—single hardware thread—and utilizing such contention in Spectre Attacks.

In 2016, Fogh introduced a technique called Covert Shotgun [7] in-which two processes running on threads in the same SMT core run through an iterative set of instruction groupings and time the results to determine if those instructions can cause measurable contention on the shared resources. Recently, two works have concurrently implemented such an approach to test the viability of port contention as a covert channel between two such processes. This approach utilizes the fact that ports may only issue one instruction per cycle to their underlying functional units, thus if both processes attempt to issue instructions that require functional units on the same port, one of the processes will need to stall that clock cycle—causing a measurable timing difference. PortSmash [4] utilized port contention to create a microarchitectural side-channel to leak the secret key from a vulnerable version of OpenSSL. SmotherSpectre [3] utilized port contention as the covert channel in a Branch Target Injection (BTI [13]) Spectre attack. Using BTI allowed this attack to run attacker code to transiently access secret in the victim and then to execute specific instructions—dependent on secret value—that could be easily detected by the attacker’s process. Our work differs from all three approaches as we focus on non-SMT contention channels which face unique challenges, and from the latter two as we focus on functional unit contention, and not port contention.

9 Conclusion and Future Work

In this paper, we showed that it is possible to create a covert channel utilizing concurrent contention on functional units from a single hardware thread. We introduced a new covert channel, which utilizes contention on the floating point division unit in commodity Intel and AMD processors. We then showed that how the covert channel can be used in conjunction with common transient execution attacks. As future work, we would like to implement our framework within a sandboxed environment, such as web browser’s JavaScript engine, to test its viability across transient attacks on those environments. We also plan to investigate if other microarchitectural structures can be used to create concurrent contention based covert channels in the context of SpectreRewind framework.


  • [1] A. Abel and J. Reineke (2019) Uops.info: characterizing latency, throughput, and port usage of instructions on intel microarchitectures. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, New York, NY, USA, pp. 673–686. External Links: ISBN 978-1-4503-6240-5, Link, Document Cited by: Table 1, §5, footnote 1.
  • [2] O. Aciicmez and J. Seifert (2007-Sep.) Cheap hardware parallelism implies cheap security. In Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC 2007), Vol. , pp. 80–91. External Links: Document, ISSN Cited by: §8.
  • [3] A. Bhattacharyya, A. Sandulescu, M. Neugschwandtner, A. Sorniotti, B. Falsafi, M. Payer, and A. Kurmus (2019) SMoTherSpectre: exploiting speculative execution through port contention. arXiv preprint arXiv:1903.01843. Cited by: §1, §8.
  • [4] A. Cabrera Aldaya, B. Bob Brumley, S. ul Hassan, C. Pereida García, and N. Tuveri (2019-05) Port contention for fun and profit. In 2019 IEEE Symposium on Security and Privacy (SP), pp. . External Links: Document Cited by: §8.
  • [5] (2018) Cache speculation side-channels. ARM White paper. Cited by: §1, §2.2.
  • [6] C. Canella, J. V. Bulck, M. Schwarz, M. Lipp, B. von Berg, P. Ortner, F. Piessens, D. Evtyushkin, and D. Gruss (2018) A systematic evaluation of transient execution attacks and defenses. CoRR abs/1811.05441. External Links: Link, 1811.05441 Cited by: §1.
  • [7] A. Fogh. (2016) Https://cyber.wtf/2016/09/27/covertshotgun/. Cited by: §8.
  • [8] A. Gonzalez, B. Korpan, J. Zhao, E. Younis, and K. Asanović (2019) Replicating and mitigating spectre attacks on an open source risc-v microarchitecture. In Third Workshop on Computer Architecture Research with RISC-V (CARRV), Cited by: §1, §3, §7.1.
  • [9] J. Horn (2018) Speculative execution, variant 4: speculative store bypass. Note: https://bugs.chromium.org/p/project-zero/issues/detail?id=1528 Cited by: §1, §2.2.
  • [10] Intel (2018-07) Intel Analysis of Speculative Execution Side Channels (Rev. 4.0). Technical report External Links: Link Cited by: §1, §2.2.
  • [11] K. N. Khasawneh, E. M. Koruyeh, C. Song, D. Evtyushkin, D. Ponomarev, and N. Abu-Ghazaleh (2019) SafeSpec: Banishing the Spectre of a Meltdown with Leakage-Free Speculation. In 56th Annual Design Automation Conference (ACM DAC), Cited by: §1, §3, §4, §7.1.
  • [12] V. Kiriansky and C. Waldspurger (2018) Speculative buffer overflows: attacks and defenses. arXiv preprint arXiv:1807.03757. Cited by: §1, §2.2.
  • [13] P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom (2019-05) Spectre attacks: exploiting speculative execution. In 2019 IEEE Symposium on Security and Privacy (SP), Vol. , Los Alamitos, CA, USA, pp. . External Links: ISSN CFP19020-ART, Document, Link Cited by: §1, Figure 2, §2.2, §2.2, §8.
  • [14] E. M. Koruyeh, K. N. Khasawneh, C. Song, and N. Abu-Ghazaleh (2018) Spectre returns! speculation attacks using the return stack buffer. In WOOT, Cited by: §1, §2.2.
  • [15] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg (2018) Meltdown: reading kernel memory from user space. In USENIX Security), Cited by: §1, §2.2.
  • [16] G. Maisuradze and C. Rossow (2018) ret2spec: Speculative execution using return stack buffers. In ACM(CCS), pp. 2109–2122. Cited by: §1, §2.2.
  • [17] M. Minkin, D. Moghimi, M. Lipp, M. Schwarz, D. Genkin, D. Gruss, B. Sunar, F. Piessens, and Y. Yarom (2019) Fallout: reading kernel writes from user space. Cited by: §1, §2.2.
  • [18] G. Saileshwar and M. K. Qureshi (2019) CleanupSpec: an “undo” approach to safe speculation. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’52, New York, NY, USA, pp. 73–86. External Links: ISBN 9781450369381, Link, Document Cited by: §1, §3, §7.1.
  • [19] M. Schwarz, M. Lipp, D. Moghimi, J. Van Bulck, J. Stecklina, T. Prescher, and D. Gruss (2019) ZombieLoad: cross-privilege-boundary data sampling. In CCS, Cited by: §1, §2.2.
  • [20] J. Stecklina and T. Prescher (2018) LazyFP: leaking fpu register state using microarchitectural side-channels. arXiv preprint arXiv:1806.07480. Cited by: §1, §2.2.
  • [21] E. Tromer, D. A. Osvik, and A. Shamir (2010-07) Efficient cache attacks on aes, and countermeasures. J. Cryptology 23, pp. 37–71. External Links: Document Cited by: §8.
  • [22] D. M. Tullsen, S. J. Eggers, and H. M. Levy (1995) Simultaneous multithreading: maximizing on-chip parallelism. In Proceedings of the 22Nd Annual International Symposium on Computer Architecture, ISCA ’95, New York, NY, USA, pp. 392–403. External Links: ISBN 0-89791-698-0, Link, Document Cited by: §2.3.
  • [23] J. Van Bulck, M. Minkin, O. Weisse, D. Genkin, B. Kasikci, F. Piessens, M. Silberstein, T. F. Wenisch, Y. Yarom, and R. Strackx (2018-08) Foreshadow: extracting the keys to the Intel SGX kingdom with transient out-of-order execution. In Proceedings of the 27th USENIX Security Symposium, Note: See also technical report Foreshadow-NG [27] Cited by: §1, §2.2, 27.
  • [24] S. van Schaik, A. Milburn, S. Österlund, P. Frigo, G. Maisuradze, K. Razavi, H. Bos, and C. Giuffrida (2019-05) RIDL: rogue in-flight data load. In S&P, Cited by: §1, §2.2.
  • [25] J. Wampler, I. Martiny, and E. Wustrow (2019) ExSpectre: hiding malware in speculative execution. In NDSS, Cited by: §6.1.
  • [26] Z. Wang and R. B. Lee (2006-12) Covert and side channels due to processor architecture. In 2006 22nd Annual Computer Security Applications Conference (ACSAC’06), Vol. , pp. 473–482. External Links: Document, ISSN 1063-9527 Cited by: §8.
  • [27] O. Weisse, J. Van Bulck, M. Minkin, D. Genkin, B. Kasikci, F. Piessens, M. Silberstein, R. Strackx, T. F. Wenisch, and Y. Yarom (2018) Foreshadow-NG: breaking the virtual memory abstraction with transient out-of-order execution. Technical report. Note: See also USENIX Security paper Foreshadow [23] Cited by: §1, §2.2, 23.
  • [28] M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. W. Fletcher, and J. Torrellas (2018) InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy. In International Symposium on Microarchitecture (MICRO), Cited by: §1, §3, §4, §7.1.
  • [29] Y. Yarom and K. Falkner (2014-08) FLUSH+reload: a high resolution, low noise, l3 cache side-channel attack. In 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, pp. 719–732. External Links: ISBN 978-1-931971-15-7, Link Cited by: §8.