SpecFuzz: Bringing Spectre-type vulnerabilities to the surface

05/24/2019 ∙ by Oleksii Oleksenko, et al. ∙ 0

SpecFuzz is the first tool that enables dynamic testing for speculative execution vulnerabilities (e.g., Spectre). The key is the concept of speculation exposure: The program is instrumented to simulate speculative execution in software by forcefully executing the code paths that could be triggered due to mispredictions, thereby making the speculative memory accesses visible to integrity checkers. Combined with the conventional fuzzing techniques, speculation exposure enables more precise identification of potential vulnerabilities compared to the state-of-the-art static analyzers. Our prototype for detecting Spectre V1 vulnerabilities successfully identifies all known variations of Spectre V1, and dramatically reduces the overheads compared to the deployed Speculative Load Hardening mitigation across the evaluated applications, eliminating instrumentation from 99 some of them.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spectre [24, 25, 18, 35] is a class of speculative execution attacks that poses a significant threat to system security. They allow an attacker to extract secrets from bug-free programs by exploiting security flaws in the underlying CPU hardware. Spectre-type attacks exploit hardware optimizations that allow the CPU to execute code out of order. For example, if an array access is guarded by a bounds check, the CPU may predict that the check will not fail and start the access speculatively, before knowing if it is allowed. If the prediction later appears to be wrong, the CPU will cancel the architectural changes caused by the speculation, such as updates to the CPU registers’ values. However, it will not cleanse some of the microarchitectural changes, such as the cached data. The speculative attacks use this property to deduce the values loaded during the speculation and, thus, bypass the software defences.

Unlike other speculative execution attacks (Meltdown [27] and Foreshadow [14]), some variants of Spectre are not expected to be fixed by hardware vendors [11, 12]. Therefore, the burden of protecting programs from Spectre lies entirely on software developers.

Unfortunately, the software tools for mitigating these attacks suffer from either high performance penalty or low precision. The conservative mitigation techniques [2, 11, 17, 38] pessimistically instrument every instruction sequence prone to the speculation such that the speculation is either prevented or becomes provably benign. Some of these techniques, however, result in a significant performance overhead, making the application up to several times slower than the native version [33].

Another approach is to use static analysis tools [30, 15, 21] to find Spectre gadgets—code patterns that may serve to mount the attack. Even though they cause much lower overheads, prior work [23] has shown that such tools may overlook the gadgets that slightly deviate from those expected by the analyzer. It renders such mitigation tools ineffective because even a single missed Spectre gadget may, in the worst case, allow leakage of the entire application memory space.

In this work, our goal was to attain the performance benefits of the analysis techniques without confining ourselves to any specific vulnerability fingerprints. To this end, we harness the state-of-the-art dynamic bug testing tools, such as fuzzing, to detect Spectre-type vulnerabilities.

Fuzzing [45] is a well-established technique broadly used for dynamic testing. The basic idea of fuzzing is simple: We feed the program with randomized or diversified inputs to find tests cases that trigger a bug. It is commonly used, for example, to detect memory safety violations by combining fuzzing with memory safety techniques such as Intel MPX [13] or AddressSanitizer [36].

Fuzzing, however, cannot be applied directly. The primary challenge we tackle is that speculative execution vulnerabilities are normally undetectable through fuzzing because the side effects of misspeculation are discarded by the CPU without exposing them to the software. Yet, exactly these side effects (e.g., a speculative access outside of the array bounds) make the attacks possible. Therefore, all Spectre-type memory accesses pass under the radar of memory safety checkers and, consequently, under the radar of fuzzing.

To overcome this problem, we introduce the concept of speculation exposure, the first technique to enable dynamic testing for Spectre-type vulnerabilities. It leverages software simulation of the speculative behavior to turn speculative vulnerabilities into conventional ones and, thus, make them detectable by integrity checks. The concept is generic and can be universally applied to different Spectre attacks with only small modifications.

Speculation exposure consists of three phases executed every time we encounter a potentially-vulnerable instruction sequence: \⃝raisebox{-0.9pt}{1} take a checkpoint of the process state, \⃝raisebox{-0.9pt}{2} simulate the optimization and execute the speculative path, and \⃝raisebox{-0.9pt}{3} rollback the process to the checkpoint and continue normal execution. This way, we force the application into taking the control flow path that could be executed due to mispredictions and all invalid accesses on these paths become visible to the software integrity checkers.

To showcase the method, we implement SpecFuzz, a tool for detecting Bounds Check Bypass (BCB) vulnerabilities. SpecFuzz simulates conditional branch mispredictions by inverting branch conditions while executing speculative paths. It detects invalid behavior during the simulation by applying AddressSanitizer [36].

Our evaluation shows that SpecFuzz is considerably more effective at finding the vulnerabilities than the existing static analysis tools. In total, it detected 2055 unique invalid speculative accesses across the six libraries we tested, while Spectre 1 Scanner [15]—a static analysis tool—found only 38 BCB instances. Furthermore, the additional information we received from the dynamic testing allowed us to prune most of the found vulnerabilities as they appeared to be not realistically exploitable. This left us with only 10 branches that actually required a patch (none of them were detected by the scanner).

Finding so few instances was surprising to us, and it demonstrated that BCB vulnerabilities are much less widespread than it is presently believed. It also showed how superfluous the conservative defences against BCB are. For example, Speculative Load Hardening (SLH) [2]

, instrumented 4382 branches across the tested libraries, while SpecFuzz found only 10 vulnerabilities, meaning that 99% of the SLH instrumentation was probably not necessary (although we cannot guarantee it, see next). Because of that, the patches produced based on the fuzzing results have a much lower overhead compared to the full-program instrumentation: In our experiments, SLH caused 80% slowdown on average, whereas SpecFuzz-based patches had 12% in the

worst case.

These performance benefits, however, come at the cost of relaxed security guarantees compared to conservative protection. First, reaching complete coverage with fuzzing is virtually impossible and, although we strove to have a high coverage, it is still possible that some speculative out-of-bounds accesses were not triggered during the experiments. Second, our current implementation of SpecFuzz does not simulate branch misprediction precisely: It does not support nested mispredictions, and it cannot proceed with speculation after encountering an error. These are, however, implementation issues and they can be fixed in future without redesigning the tool.

Our contributions include:

  • Speculation exposure, a generic simulation method for Spectre-type vulnerabilities that makes them detectable through conventional dynamic testing techniques.

  • SpecFuzz, an implementation of the method applied to detection of Bounds Check Bypass vulnerabilities.

  • An analysis technique for processing and ranking the results of dynamic testing with SpecFuzz.

  • Evaluation of SpecFuzz on a set of popular libraries and comparison with two existing BCB mitigation mechanisms: a conservative technique Speculative Load Hardening and a static analysis tool Spectre 1 Scanner.

To show how SpecFuzz works in practice, we additionally provide a demo containing 15 BCB variants [24] that are compiled with SpecFuzz. Find the demo under: https://cloud.docker.com/repository/docker/tudinfse/specfuzz_demo

2 Background

2.1 Speculative Execution

In modern processors, execution of a single instruction requires several stages, such as fetching, decoding, and reading. To improve performance, nearly all modern CPUs execute them in a pipelined fashion: When one instruction passes a stage, the next instruction can enter the stage without waiting for the first one to pass all the following stages. This allows for much higher levels of instruction parallelism and for better utilization of the hardware resources.

However, in certain situations—called hazards—it is not possible to start executing the next instruction immediately. A hazard may happen in three cases: a structural hazard appears when there are no available execution units, a data hazards—when there is a data dependency between the instructions, and control hazard—when the first instruction modifies the control flow (e.g., at a conditional branch) and the CPU does not know what instruction will run next. As the hazards are stalling the CPU, they can significantly reduce its performance.

To deal with the control hazards (and sometimes, with data hazards), modern CPUs try to predict the outcome of the situation and start speculatively executing the instructions assumed next. For example, when the CPU encounters an indirect jump, it predicts the jump target based on the history of recently used targets and redirects the control flow to it. While the CPU does not know if the prediction was correct, it keeps track of the speculative instructions in a so-called Reorder Buffer (ROB). The results of these instructions are temporarily kept in internal buffers or registers and are architecturally invisible (i.e., the software do not have access to them). Eventually, the CPU resolves the hazard and, depending on the outcome, either commits the results to the architectural state or discards them.

2.2 Speculative Attacks

In a speculative attack, the attacker intentionally forces the CPU into making a wrong prediction and speculatively executing a wrong control flow path. Because taking the path violates the application semantics, it may bypass security checks within the application. Moreover, if any exceptions should appear on the speculative path, they will be handled only during the retirement stage.

For a long time, this behavior was considered safe because the CPU never commits the results of a wrong speculation. However, as it was discovered in Spectre [24] and Meltdown [27] attacks, the traces of the speculative execution are visible on the microarchitectural level. For example, the data loaded on the speculative path will not show up in the CPU registers, but will be cached in the CPU caches. The attacker can later launch a side-channel attack [40, 44] to read the traces and, based on them, retrieve the data used on the speculative path.

1i = input[0];
2if (i < size) {
3    secret = foo[i];
4    baz = bar[secret]; }\end{lstlisting}
5    \caption{Example of a potential Bounds Check Bypass vulnerability.}
6    \label{fig:bcb_pattern}
7\end{figure}

2.3 Bounds Check Bypass

In this paper, we will showcase our dynamic testing approach on one of the speculative attacks, namely Bounds Check Bypass (BCB, also called Spectre V1) [24]. In essence, Bounds Check Bypass is a conventional out-of-bounds memory access (e.g., buffer overflow) that happens during speculative executions because of a mispredicted conditional branch.

Consider the code snippet in Figure LABEL:fig:bcb_pattern: Without the bounds check on line 2, an adversary with control over the input can force the load on line 3 to read from any address, including those beyond the bounds of the array foo. Normally, vulnerabilities of this type are prevented by adding a bounds checks before the memory access, such as the one on line 2. However, the adversary can train the branch predictor to anticipate that the check will pass. Then, the CPU will start speculatively executing lines 3 and 4 even if the index i is out of the array bounds. Later, it will find out that the prediction was wrong and discard the speculated load, but its cache traces will stay. The adversary can access the traces by launching a side-channel attack and, hence, find out the secret value.

3 Speculation Exposure

Speculative vulnerabilities are notoriously hard to find because hardware strives to hide the effects of speculative execution from software, making it impossible to detect such vulnerabilities with conventional testing methods. In this paper, we approach the problem by simulating the unsafe hardware optimization in software to uncover the speculative vulnerabilities.

To understand how we construct the simulation, first consider how speculative execution is implemented in hardware (2.1). When a hazard appears (e.g., at a conditional or an indirect jump), the CPU \⃝raisebox{-0.9pt}{1} makes a prediction on its outcome, \⃝raisebox{-0.9pt}{2} executes the predicted path while keeping the results in a temporary storage, \⃝raisebox{-0.9pt}{3} eventually eliminates the hazard and either commits the results (correct prediction) or discards them (wrong prediction), and proceeds with the correct path.

For example, in Figure 1, the CPU might make a wrong prediction that BB1 (Basic Block 1) will proceed into BB3. Then, it will start executing BB3, BB4, and further. When the hazard is eliminated, the CPU determines that the prediction was wrong and discards all changes made by the speculated instructions. Afterward, it proceeds to execute the correct path starting from BB2.

Figure 1: Example of speculative execution. Due to a misprediction, the program executes basic blocks BB3 and BB4, then detects the mistake, discards the results, and continues execution starting from BB2.

We can simulate this behavior with a "checkpoint-mispredict-rollback" scheme: At a potential hazard, we take a checkpoint of the current process state. Then, we diverge the control flow into a wrong (mispredicted) path and start executing it. When the maximal possible length of speculative execution is reached or if we encounter a serializing instruction, we rollback to the checkpoint and proceed with normal execution. The pattern can be applied to data hazards too: Instead of diverging the control flow, we would replace a memory/register value with a mispredicted one.

This basic mechanism simulates the worst case scenario with a CPU that always makes a wrong prediction and always speculates to the longest possible depth. Such a pessimistic approach makes the testing results universally applicable to any CPU model and any execution conditions. Moreover, it also covers all possible combinations of correct and incorrect predictions that could happen at runtime.

Checkpointing. For storing the process state, we could use any of the existing checkpointing mechanisms, ranging from full-process checkpoint (e.g., CRIU [1]) to transactional memory techniques (e.g., Intel TSX [13]), although it is preferable to use light-weight, low-overhead mechanisms to reduce the testing time. We describe the checkpointing mechanism used in our implementation in 4.2.

Simulating misprediction. To simulate misprediction, we instrument the basic blocks in a way that will force control flow to enter the paths that the CPU would otherwise take speculatively. The nature of instrumentation would depend on the exact type of speculative execution attack being simulated.

Terminating simulation. The final question is, when do we terminate the simulation? In real hardware, the termination happens in the following cases:

  • At serializing instructions (e.g., LFENCE, CPUID).

  • When the maximum speculation depth is reached.

The first case is straightforward: We execute a rollback every time when we encounter a serialization instruction.

For the second case, we assume that the speculation depth is limited by the size of Reorder Buffer (ROB) as the speculation can proceed only as long as there is empty space in ROB. In modern CPUs, ROBs can fit no more than 250 microoperations (ops) (the largest we know is 224 entries, in Intel Skylake architecture). Because software does not have access to

op counters, we have to fall back to estimating the size as 250 machine instructions, which are easier to count. It is an overestimation because normally one instruction maps to at least one

op111The only exception is ops fusion, when CPU merges several instructions into one. However, it is a rare event that should not compromise the security guarantees..

3.1 Applying Simulation to Spectre Attacks

The simulation mechanism will depend on the specific vulnerability that we want to simulate. While we explain the instrumentation we have implemented for Branch Check Bypass in the later sections, below we give an overview of instrumentation that can be used for other Spectre-type attacks.

Branch Target Injection [24] is a Spectre variant targeting speculation at indirect jumps. When an indirect jump instruction is executed, the CPU speculates the jump target using the branch predictor without waiting for the actual target address computation to finish. The attacker can exploit this behavior by training the branch predictor to execute jump to a code snippet that would leak program data via a side channel.

SpecFuzz can be modified to simulate BTI by maintaining a software history buffer for every indirect branch in the application. Then, at an indirect branch, SpecFuzz would 1) record the current branch target into the history buffer and 2) run a simulation for every previously recorded target. This approach works, however, under the assumption that attacker can train the branch predictor only by providing data to the application and cannot inject arbitrary targets into the IBTB from another application on the same core.

Return Address Misprediction [29, 25] attack is a variant of Branch Target Injection. The CPU maintains a small number of most recently used return addresses in a dedicated cache, pushing the return address into this cache on each call instruction and popping it from the cache on each return instruction. When this cache becomes empty, the CPU will speculate the return address using the indirect Branch Target Buffer. To simulate this vulnerability, SpecFuzz can instrument call and return instructions to, correspondingly, increment and decrement a counter, jumping to an address from history buffer on return addresses with negative or zero counter value.

Speculative Store Bypass. [18] is a microarchitectural vulnerability caused by CPU ignoring the potential dependencies between load and store instructions during speculation. When a store operation is delayed, subsequent load from the same address may speculatively reuse to old value from the cache. To simulate this attack, SpecFuzz could be extended to start a simulation before every basic block that contains a write to memory. Then, we would skip the store during the simulation, but execute it after the rollback. If a basic block contains several stores, then we split it and execute the simulation for every store.

1 2 3 4if x < array_size: 5 6    result = array[x] 7    ...
(a) Native version
    checkpoint()     if x >= array_size:         goto skip_branch     if x < array_size skip_branch:     result = array[x]     ...     if max_depth_reached():         rollback() // to line 4
(b) Simulation of conditional branch misprediction
Figure 2: Example of the SpecFuzz instrumentation
Figure 3: Simulation of conditional branch mispredictions: The terminator condition is replaced by an inverse one. When the simulation is ended, the program returns to the normal flow.

4 SpecFuzz: Exposure of Bounds Check
Bypass

To showcase our approach on a specific vulnerability class, we develop a tool for simulating and detecting Bounds Check Bypass (BCB) [24]. We call the tool SpecFuzz.

BCB in its core contains a speculative out-of-bounds access caused by a misprediction of a conditional jump target (see 2.3). To detect the access, we have to implement two components: a simulation of the misprediction (3) and a mechanism for detecting invalid memory accesses during the speculation. The latter is relatively straightforward as we can use one of many existing memory safety techniques; in SpecFuzz, we used AddressSanitizer [36]. To implement the former, though, we need a custom technique.

To simulate conditional branch mispredictions, we create a modified (instrumented) version of the application which executes not only the normal control flow but also the paths that could be taken as a result of mispredictions. Consider the example on Figure 2. Before the conditional branch (line 4), we insert a call to a checkpointing function (line 1) that stores the current process state and initializes simulation. Then, we simulate a misprediction by inserting a branch statement with an inverted condition (line 2) and a jump into the body of the conditional block, thus skipping the original branch (line 3). We proceed with the execution until reaching a terminating condition: either the maximum speculation depth (line 8) or a serializing instruction (not present in the example). After that, we restore the process state to the previous checkpoint (line 9) and redirect the execution to the original branch statement.

We implement this design as a combination of an LLVM [26] backend pass for the x86 architecture and a runtime library.

Figure 4: The workflow of testing an application with SpecFuzz.

4.1 Simulating Branch Misprediction

SpecFuzz simulates mispredictions by forcing the application into taking a wrong branch at every conditional jump. We implement this behavior by replacing all conditional terminators in the program with the ones that have an inverted condition (see Figure 3). Now, when the original basic block (BB) would proceed into the successor , the modified terminator diverges the control flow into . The original terminator is moved into a separate BB, and the control flow returns to normal execution by rolling back into this BB after the simulation.

As a result, every time the program reaches this BB, it first executes the simulated path, then rolls back to the BB and continues with normal execution. We apply this instrumentation to all conditional branches.

4.2 Saving and Restoring Process State

The main requirement to the rollback mechanism used in SpecFuzz was to have low performance impact so that the testing time is kept short. To this end, we implemented a light-weight in-application mechanism that snapshots the CPU state before the simulation and records the memory changes during the simulation.

To store the CPU state, we add a call to the checkpointing function before every conditional jump. The function takes a snapshot of the register values (including GPRs, flags, SIMD, floating-point registers, etc.) and stores it into memory. During the rollback, we restore the register values based on the snapshot. The function also stores the address of the original conditional jump (i.e., original terminator) that we later use as a rollback address.

We could apply a similar mechanism to save the memory state, but this would have an unacceptable performance cost, especially considering that we would have to dump memory contents at every conditional jump. Instead, we log all writes to memory during the simulation. Before every instruction that modifies memory (e.g., mov, push, call), we store the address it modifies and its previous value. Then, to do a rollback, we go through the changes in the reverse order and restore the old values.

4.3 Terminating Simulation

As discussed in 3, we terminate the simulation either if we encounter a serializing event or when the maximum depth of speculation is reached.

To implement the first case, we simply invoke the rollback function before every serializing instruction. As serializing, we consider the following instructions:

  • The instructions listed as serializing in the Intel documentation [13], such as LFENCE or CPUID.

  • System calls. We assume that executing any system call takes longer than the maximum possible duration of speculative execution.

  • External function calls. By the virtue of being implemented as a compiler pass, SpecFuzz cannot correctly run the simulation beyond the instrumented code. Therefore, we have to consider all calls to external functions as serialization points, even though it is not necessarily a correct behavior. In 7, we discuss a potential solution to this problem.

For the second case, we count instructions at runtime. For this, we keep a global instruction counter and set it to zero when a simulation begins. At the beginning of every basic block, we add its length to the counter. (We know the length at compile time because SpecFuzz is a backend pass). When the counter value reaches 250 (maximum possible speculation depth, see 3), we invoke the rollback function.

4.4 Handling errors

Finally, with the simulation mechanism at hand, we have to correctly respond to detections of out-of-bounds accesses or to other error conditions that appear during the simulation. In contrast to normal, nonspeculative execution, the process does not crash if an error happens during the speculation. Instead, the CPU silences the error by discarding its effects when the misprediction is detected.

To simulate this behavior in SpecFuzz, we had to adapt the error response mechanism in AddressSanitizer (we rely on it for detecting out-of-bounds accesses). Normally, upon detecting a bounds violation, AddressSanitizer terminates the application and reports the error. We modified it to instead record the violation in a log and rollback to the previous checkpoint222A more correct response would be to log and continue simulation, although current implementation of SpecFuzz does not yet support it.. Accordingly, one test run might detect several errors. In practice, we observed up to hundreds of errors per single invocation.

Similarly, we have to recover from runtime faults. We register a custom signal handler that logs and rolls back after the signals that could be caused by an out-of-bounds access, such as SIGSEGV and SIGBUS. We also rollback after other faults (e.g., division by zero), but we do not record them in the log as they are irrelevant to the BCB vulnerability.

5 Fuzzing with SpecFuzz

Given the simulation technique described in the previous section (4), we can test applications with conventional dynamic testing methods, such as fuzzing. In our experiments, we used the workflow in Figure 4.

First, we compile the application under test with Clang and apply the SpecFuzz pass, thus producing an instrumented binary that simulates branch mispredictions. Second, we fuzz the binary. In our experiments, we used HonggFuzz [9], an evolutionary coverage-driven fuzzer, and we relied on Intel Processor Trace [22] for measuring code coverage. After fuzzing, we aggregate the traces and analyze the detected vulnerabilities.

5.1 Aggregation of results

As a result fuzzing, we get a trace of detected speculative out-of-bounds accesses. Each entry in the trace has a form of a tuple:

  (Accessed address; Offending instruction; Mispredicted branch)

Usually, the trace is long and may contain up to hundreds of detections per every test run. This happens because the simulation forces the application into doing wrong actions, which frequently leads to errors.

To make the trace usable, we aggregate the results per run and per instruction. That is, for every test run, we collect all the addresses that every unique offending instruction accessed as well as the addresses of branches, mispredictions of which triggered the execution of this instruction. Appendix A is an example of what we get from the aggregation.

5.2 Vulnerability analysis

After the aggregation, we have a list of vulnerabilities with an approximate range of addresses that each of them can access. As we will see in 6.2, the list may be rather verbose and contain up to multiple thousands of vulnerabilities. Yet, we argue that most of them are not realistically exploitable.

In many cases, the attacker does not have any control over the accessed address. This could happen, for example, when the application tries to speculatively dereference a field of an uninitialized structure. In this case, no matter what input we provide to the application, the speculative dereference will always go to the same address. This lack of control shifts the vulnerability into the category of conventional side-channel attacks [40, 44] that are less critical than BCB because they provide less information to the attacker and are harder to launch. Moreover, the defence strategies for these attacks are also different from BCB.

We identify these cases by analyzing the aggregated traces. We estimate the presence of the attacker’s control by comparing the accessed addresses in every run. If a given offending instruction always accessed the same set of addresses, we assume that the attacker does not have control over it. Note, however, that the heuristic is valid only after a large enough number of test runs.

6 Evaluation

In this section, we try to answer the following questions:

  • How effective is SpecFuzz at detecting BCB vulnerabilities?

  • Is it better at finding the vulnerabilities than the existing static analysis tools?

  • Does patching based on SpecFuzz results give us a performance improvement compared to full-application protection?

To put the results into a context, we compare SpecFuzz to two existing open-source projects: Spectre V1 Scanner (RH Scanner) [15]—a static analysis tool from RedHat, and Speculative Load Hardening (SLH) [2]—an LLVM pass that masks all speculative loads thus providing a conservative defence against BCB.

Instead of RH Scanner, we could have compared SpecFuzz with more advanced static analysis tools Respectre [21] and oo7 [43], but they are not freely available: Respectre is a commercial product and oo7 is provided only upon request. We did not manage to get access to oo7 and the comparison with Respectre is still being arranged with the authors.

Applications. For the evaluation, we tested six commonly used libraries. The libraries include two cryptographic functions (AES from libTomCrypt [10] and RSA from BearSSL [4]), a compression algorithm (Brotli [5]), and three parsing libraries, JSON (JSMN [6]), HTTP [7], and YAML [8]. We picked these specific libraries because they may directly process unsanitized user input from the network, giving the attacker better chances of controlling memory accesses within the libraries.

Testbed. We ran all the experiments on a 4-core (8 hyper-threads) Intel Core i7 CPU operating at 3.4 GHz (Skylake microarchitecture) with 32 KB L1 and 256 KB L2 private caches, an 8 MB L3 shared cache, and 32 GB of RAM. The machine was running Linux kernel 4.16.

6.1 Detection of BCB Gadgets

MSVC RH Scanner SLH SpecFuzz Total
2 12 15 15 15
Table 1: The number of basic BCB variants detected by different mitigation tools.

With the first experiment, we want to show that SpecFuzz is effective at detecting different variations of BCB. To this end, we tested 15 sample BCB variants created by Paul Kocher [23] which represent 15 different ways BCB may occur in C code. The variants were originally designed to illustrate the shortcomings of the BCB mitigation mechanism in MSVC [30], but they can serve as a good starting point for evaluating effectiveness of any BCB detection tool. Note, however, that the suite is not exhaustive and it does not represent all possible variants of BCB; rather, it evaluates the basic detection capabilities of a tool.

The testing results are presented in Table 1. As expected, the simulation in SpecFuzz works correctly and surfaces all speculative out-of-bounds accesses, which are then detected by AddressSanitizer. As of the other tools, the original article [23] reported that the MSVC pass detects only 2 variants333The result may be outdated. We did not test the newer versions of the pass and it may have improved since the publication of the article.

. It happens because it relies on simple pattern matching, that is, it searches for specific patterns in the binary. Accordingly, if the vulnerability happens to take a form not envisioned by the developers, the compiler will not protect it. The same goes for RH Scanner, although it relies on more generic patterns and thus, detects more variants. SLH does not attempt to find the vulnerabilities and instead protects all conditional branches in the application, which guarantees complete coverage.

6.2 Fuzzing results

AES RSA Brotli JSMN HTTP YAML Total
SpecFuzz Total 3 16 1909 17 15 95 2055
Controlled 0 2 980 2 3 25 1012
In C 0 2 52 2 3 19 78
Verified 0 0 6 (3) 2 2 0 10

 

RH Scanner Total 0 13 (3 new) 5 1 1 18 38
True positive 0 8 4 1 1 4 18
Controlled 0 0 0 0 0 0 0

 

SLH: instrumented branches 21 175 253 83 452 3398 4382
Table 2: Vulnerabilities found by SpecFuzz, RH Scanner, and the total number of branches instrumented by SLH.

Next, we wanted to see how effective SpecFuzz is at detecting vulnerabilities in the wild. To this end, we fuzzed six real-world libraries. The fuzzing of each application lasted for 24 hours. Before running each experiment, we fuzzed a native (i.e., not instrumented) version of the application to create an initial input corpus. This way, we ensured large coverage (see 7 for discussion of the coverage issue).

The results are presented in Table 2. The first row (Total) represents the total number of vulnerabilities detected by SpecFuzz. There is a vast difference between the results, ranging from almost 2000 vulnerabilities found in Brotli to only 3 found in the AES code. One reason for it is the code type: AES and RSA are cryptographic functions that are written with side-channel attacks in mind. They strive to avoid branching on the input, which reduces the opportunities for BCB. Another factor is the sheer code size: Brotli has ~9000 LOC while JSMN has less than 400 LOC.

For most of the vulnerabilities, however, we did not observe any correlation between the input and the accessed address, which puts them into the low-risk category (see 5.2). The second row (Controlled) is what is left after filtering them out. The number becomes even lower if we convert them into locations in the C code (the third row, In C). As SpecFuzz works with binaries, a single vulnerability in C may be reported several times due to compiler optimizations, such as loop unrolling.

Finally, we manually checked the results. The fourth row (Verified) are realistic vulnerabilities that have to be patched. The difference between the rows 3 and 4 is caused by the fact that AddressSanitizer does not provide precise information about the overflows and SpecFuzz sometimes falsely marks them as Controlled (see 7). In a few cases (in brackets), we were not able to find out if the attacker has control over the accessed address.

For comparison, we also tested the applications with RH Scanner (rows 5 and 6). It detected much fewer vulnerabilities than SpecFuzz and in none of them the attacker has control over the accessed address.

Interestingly, RH Scanner found 3 low-risk vulnerabilities in RSA that were not triggered by fuzzing. After manual inspection, we found out that SpecFuzz did not detect them because they require nested misprediction to be triggered, which our implementation does not yet support.

6.3 Performance impact of the patches

In this experiment, we wanted to see whether patching based on the fuzzing results gives us better performance compared to full-application instrumentation, such as the one implemented by SLH. To evaluate it, we manually patched the vulnerabilities found in the previous experiment. We patched only the controlled vulnerabilities because we assume that a vulnerability is not realistically exploitable if the attacker does not have any control over the accessed address (see 5.2). To prevent the vulnerabilities, we added an LFENCE instruction after every risky branch. Then, we measured the runtimes of the patched applications and compared them to the overheads of SLH protection. Every measurement was repeated 100 times and then averaged.

The results are presented in Figure 5. As we can see, the overhead is significantly lower compared to the one introduces by SLH. In three cases—AES, RSA, and the YAML parser—no patch was necessary as we did not find any controllable vulnerabilities. In the other cases, the patches introduced only up to 12% overhead.

Such a drastic difference stems from the fact that we patch only a small subset of the branches in the program. If we look at the total numbers (Table 2), SLH instrumented 4382 branches across all the libraries, while SpecFuzz allowed us to reduces the number to only 10 realistic vulnerabilities.

One interesting outlier is JSMN which experienced ~500% slowdown with SLH. The outlier is caused by an extremely high density of branches in the application (approximately one branch executed every cycle) and, thus, high reliance on branch prediction to efficiently utilize instruction parallelism. SLH effectively disables this optimization and makes the execution much more sequential. At the same time, SpecFuzz found only two high-risk vulnerabilities in JSMN and patching them introduced only ~1% slowdown.

Note, however, that the results of this experiment mainly have an illustrative purpose. Because SpecFuzz and SLH represent different classes of defence tools, they have different security guarantees and their cost cannot be compared directly. While SLH is a conservative technique that guarantees complete absence of unsafe speculative loads, SpecFuzz is a testing tool that may miss certain vulnerabilities (see 7). Accordingly, they should be applied in different situations: SLH—when security is critical, and SpecFuzz—when performance is the first priority.

Figure 5: Performance overheads of the patches based on SpecFuzz compared to the slowdown introduced by SLH.

6.4 Case Study: libHTP

To showcase the fuzzing process on a specific example, this section will present the procedure of fuzzing an HTTP parsing library libHTP [7] and the vulnerabilities that we found in it. We will also examine one realistic and potentially exploitable vulnerability that SpecFuzz discovered in the library.

We picked libHTP as a case study because it is a security-critical library that could be used in a wide spectrum of web applications and a flaw in libHTP can undermine the security of any project that relies on it. Moreover, it is a security-aware parser, regularly tested and fuzzed for traditional bugs (e.g., memory safety violations), which makes it unlikely to be exploitable through conventional methods.

In this experiment, we built the library with our modified compiler and ran the fuzzers available in the project’s repository. As previously, we used HonggFuzz [9] and ran the experiment for 24 hours. Thanks to the sample inputs shipped with the project, the coverage was large from the very beginning.

Fuzzing results. In total, SpecFuzz detected 3322 unique speculative out-of-bounds accesses. Most of them are low-risk bounds violations in which the attacker cannot control the accessed address. One such example is presented in Figure LABEL:fig:htp-low-risk. Here, the CPU might mispredict the branch on line 5 and execute the dereference on line 1 (i.e., ((*(X)).realptr) even though the pointer b is NULL. The invalid address will be always to the same address, which is why we consider this vulnerability benign.

1#define bstr_ptr(X) ( ((*(X)).realptr == NULL) \
2       ? ((unsigned char *)(X) + sizeof(bstr)) \
3        : (unsigned char *)(*(X)).realptr )
4...
5if (b == NULL) return NULL;
6return bstr_util_memdup_to_c(bstr_ptr(b), bstr_len(b));\end{lstlisting}
7    \caption{Example of a low-risk speculative out-of-bounds access in libHTP.}
8    \label{fig:htp-low-risk}
9\end{figure}
10
11Using the analysis mechanism presented in \subsecref{analysis}, we automatically filtered these cases out.
12It reduced the number to 148 locations in binary or to 99 in C\@.
13The number of code locations is smaller because one line in C can be compiled into several locations in binary (e.g., due to loop unrolling) and a single vulnerability will appear in several places.
14
15Yet, even 99 is not the final number.
16As described in \secref{discussion}, the automatic analysis sometimes mistakenly marks a vulnerability as controllable.
17For example, in \figref{htp-analysis-mistake}, \sys{} detects an out-of-bounds access on line 3 (\code{data[i]}).
18A misprediction of the \code{while} condition (line 2) may cause a speculative execution of a few more loop iterations than necessary, leading to accesses to the addresses beyond the array’s end.
19The size of the array may differ from one input to another and the overflow addresses will also differ.
20This leads the analysis tool into falsely marking the vulnerability as controllable.
21
22\begin{figure}[t]
23    \begin{lstlisting}[frame=tb]
24size_t i = 0;
25while (i < len) {
26    data[i] = tolower(data[i]);
27    i++;
28}\end{lstlisting}
29    \caption{Example of a vulnerability mistakenly labeled as controllable.}
30    \label{fig:htp-analysis-mistake}
31\end{figure}
32
33\myparagraph{Found vulnerabilities}
34After we manual inspected the code (approximately 2 hours worth of work) and removed these false labelings, we were left out with only 3 controllable vulnerabilities\footnotemark{}.
35One of them (see \figref{htp-base64}) is realistically exploitable if the attacker can accurately monitor the cache state.
36Here, the function \code{base64\_decode\_single} decodes a Base64 encoded symbol by looking it up in a table of precomputed values (array \code{decoding}, lines 2--3).
37Before fetching the decoded symbol, the function checks the value for over- and underflows.
38The attacker can bypass the check by training the branch predictor and, thus, triggering a speculative overread on line 7.
39
40\footnotetext{The vulnerabilities were submitted to the library developers and are currently under review.}
41
42There are two properties of the code snippet that make the vulnerability realistically exploitable.
43First, the attacker has complete control over the accessed address because the array index (\code{value\_in}) is a part of the HTTP request, that is, a part of the program input.
44Second, the fetched value is further used for defining the control flow of the program (see the comparison on line 16), which allows the attacker to infer a part of the value (specifically, its sign) by observing the cache state.
45
46The attacker could execute the attack as follows.
47She begins by sending a probing message to find out which cache line the first element of the array \code{decoding} uses.
48Then, she sends a valid message to train the branch predictor on predicting the bounds check (line 5) as true.
49Finally, she resets the cache state (e.g., flushes the cache) and sends a message that contains a symbol that triggers an overread, followed by a symbol that triggers a read from the first array element.
50If the read value is negative, the loop will do one more iteration, execute the second read, and the attacker will see a change in the state of the corresponding cache line.
51Otherwise, the loop will be terminated and the state will not change.
52
53In summary, this vulnerability is a speculative buffer overread, which allows reading data within the range of 175 bytes beyond the bounds of the array \code{decoding} and leaks the sign of the byte it reads.
54
55The other two vulnerabilities are less realistic as they require the application to misuse the library’s interface.
56Still, it is better to protect them, especially considering the low runtime cost of the patch (see next).
57
58\myparagraph{Patch}
59To patch the vulnerabilities, we added an \code{LFENCE} instruction before every potentially unsafe load.
60According to our measurements, the patch changed the request processing time by 6.5\%, from 920 milliseconds to 980 milliseconds (averaged over 10M requests).
61For comparison, applying SLH changed the time by 9.7\%, to 1010 milliseconds.
62
63
64\begin{figure}[t]
65    \begin{lstlisting}[frame=tb]
66int base64_decode_single(signed char value_in) {
67  static signed char decoding[] =
68    {62, -1, ...}; // 80 elements
69  value_in -= 43;
70  if ((value_in < 0) || (value_in > decoding_size - 1))
71    return -1;
72  return decoding[(int) value_in];
73}
74...
75int htp_base64_decode(const void *code_in, ...) {
76  signed char fragment;
77  ...
78  do {
79    ...
80    fragment = htp_base64_decode_single(*code_in++);
81  } while (fragment < 0);
82...}\end{lstlisting}
83    \caption{A realistic BCB vulnerability in the Base64 decoding function that was found by \sys{}.}
84    \label{fig:htp-base64}
85\end{figure}

7 Limitations

In this section, we discuss the limitations of SpecFuzz that we did not envision (or underestimated) while designing the tool. All of them, however, are only implementation flaws and can be fixed in the future without redesigning the system.

Nested misprediction. One important feature that our implementation does not yet support is simulation of nested mispredictions, that is, simulation of the situations when a branch is mispredicted while speculative execution is already running. As the evaluation has shown (6.1), some vulnerabilities indeed require misprediction of multiple branches in a row to get triggered, although this appears infrequently.

To implement the nested simulation, we would have to maintain a stack of checkpoints instead of a single checkpoint: Every time we start a new simulation, we push the checkpoint on the stack, and every rollback restores the topmost checkpoint. We plan to implement this feature in the future.

Mislabeling due to incomplete information. While doing the evaluation, we discovered that our vulnerability analysis technique (see 5.2) sometimes gives a false result and mistakenly labels an uncontrolled vulnerability as a controlled one. It is not a big problem because the cost of patching is generally low, but it may introduce unnecessary overhead.

The reason for the mislabeling is as follows. At a detected bounds violation, AddressSanitizer reports only the accessed address and not the distance between the address and the referent object bounds (e.g., buffer overflow size). Therefore, if the object size differs among the test runs, the accessed address will also be different, even if the distance is the same. And because now the same instruction accesses different addresses for different inputs, the analysis falsely labels it as a controllable vulnerability.

For example, one common case of mislabeling that we encountered is off-by-one accesses. If an array is read in a loop, our simulation will force the loop to take a few additional iterations and read a few elements beyond the array’s bound. Here, the attacker has no control over the accessed addresses, but if the array size differs from one test run to another, the addresses accessed in these extra iterations will also be different. The analysis would mark this vulnerability as controllable.

So far, we handle these cases by manually analyzing the code. A better solution would be to use a more complete memory safety technique (e.g., Intel MPX [13]) that maintains metadata about referent objects. That would allow us to filter by changes in the distance to the object bound instead of changes in the accessed address. Unfortunately, none of such techniques is supported by Clang out-of-the-box. To resolve this issue, we would have to implement the support or migrate SpecFuzz to another compiler.

Mislabeling due to masking. Mislabeling may also happen if two or more transient out-of-bounds accesses happen in a row. In the current implementation, SpecFuzz rolls back immediately upon detection of an out-of-bounds access or after receiving a fault. Real hardware, however, does not behave this way and instead proceeds with the speculative execution.

This discrepancy between the simulation and the real behavior may lead to a wrong classification of a vulnerability if an uncontrolled access is followed by a controlled one. For example, if a null dereference precedes a controlled buffer overflow, the simulation will always rollback at the dereference and the overflow will never be triggered. Accordingly, the traces will show that misprediction of this branch always causes an access to the same address (i.e., to 0), and the vulnerability will be mistakenly labeled as uncontrolled.

This issue can be fixed by continuing the simulation (instead of rolling back) after recording the error. This, however, would often cause recursive faults, and our current implementation cannot yet handle them properly.

Fuzzing coverage. In our evaluation, we used IPT [22] for measuring coverage. The simulation of mispredictions, however, makes the measurement artificially inflated because it adds the speculative paths that do not belong to normal program execution. Currently, to compensate this effect, we run a preliminary fuzzing round with noninstrumented binary that creates an initial extensive input corpus. In the future, we could implement a custom coverage mechanism that would ignore the simulation.

Binary instrumentation. Instead of creating a compiler pass, we could have implemented SpecFuzz as a binary instrumentation (e.g., with PIN [28]). This would allow us to patch binaries directly, without having to traverse the vulnerability back to the source code, thus making the patching process simpler and, in some cases, the patches would introduce less overhead. Moreover, by using this approach we would be able to correctly support dynamic linking and would not require access to the source code. However, the binary instrumentation tools are normally heavy-weight and it would considerably increase the time required for testing.

8 Related Work

Finally, let us take a look at the existing alternatives to SpecFuzz. Essentially, BCB is a combination of three vulnerabilities: a conditional branch misprediction, a memory safety violation, and a side-channel leakage. Accordingly, there are three types of defences that eliminate at least one of the vulnerabilities.

8.1 Preventing unsafe speculation

The most radical solution is to disable branch prediction entirely [3], although not all processors support it. Alternatively, the speculation can be disabled on a per-branch basis by adding serialising instructions, such as LFENCE on Intel CPUs or DSBSY on ARM. This approach, however, makes the effectiveness of the CPU utilization much lower and causes a large slowdown, over 400% in certain cases [33].

A more effective conservative defence technique is to add a data dependency between the conditional branch and the potentially invalid memory access. This way, most of the instructions can benefit from branch prediction and only the memory accesses are delayed. This approach is used, for example, in Speculative Load Hardening [2]. Although it provides a performance improvement over full serialization, the performance overhead is still considerable.

To avoid the high performance cost, we could patch only those vulnerabilities that seem to be exploitable. This is the idea behind the static analysis tools like Spectre 1 Scanner from RedHat [15] and MSCV Spectre 1 pass [30]. They analyse the binary and search for the instruction sequences that resemble a BCB vulnerability. Usually, they are searching for a variation of the pattern branch->load->load/write. However, using this pattern in a generic form would lead to marking all the branches as vulnerable, which is counter to the goal of the analysis. Therefore, most of the tools rely on specific variants of the pattern. It makes them inherently incomplete and restricted to the variants envisioned by the tool developers.

A more advanced analysis technique is used in oo7 [43]. It relies on static taint analysis to detect the memory accesses that are dependent on the program input. (This is the same criteria that we used to identify low-risk vulnerabilities.) Specifically, oo7 searches for the following pattern: A conditional branch with a condition dependent on the input (i.e., tainted) is followed by a load dependent on the condition, followed by memory access dependent on the load. Even though this approach is more reliable and generic than the simple pattern-matching techniques, it is affected by the inherent problems of static taint analysis. Namely, limited analysis depth may cause false positives and overtainting causes false negatives.

Respectre [21] is another analysis tool that claims to have better coverage than the existing alternatives. However, it is a commercial product and we can neither verify the claims nor compare it to SpecFuzz.

8.2 Preventing memory safety violations

Classical memory safety techniques (e.g., Intel MPX [13], SoftBound [31]) do not protect from BCB as the bounds checks they add can be mispredicted. Yet, they can be retrofitted to disable unsafe accesses even in the speculated paths.

A variant of this approach—pointer clipping—is now used in JavaScript engines [42] where, before accessing an array element, the index is masked with the array size. Because masking is an arithmetic operation, it does not create a control hazard and is not predicted by the CPU. However, this defence is vulnerable to the attacks where the data type is mispredicted and a wrong mask is used [20].

8.3 Preventing side channels

Another approach is to allow the transient accesses, but eliminate the possibility of leaking their results through a side channel.

In practice, web browsers achieve it by reducing the resolution of timers [42], disabling shared memory or by using site isolation [34]. However, these techniques prevent only cross-site attacks, and do not work at the presence of a local attacker.

The most common target for side-channel attacks is CPU caches, and many of the practical attacks can be prevented by defending this target. There is an extensive body of research in the direction of preventing side channel attacks, ranging from cache isolation [39], to attack detection [19], enforcing non-interrupted execution [41, 32], and cache coloring [37]. Yet, they provide only a partial defence as transient execution attacks may use other side-channel targets too [35].

Finally, an isolated execution environment can be achieved with a specialized microkernel [16], but it requires a complete redesign of the system.

9 Conclusion

We presented a technique to make speculative execution vulnerabilities visible by simulating them in software. We demonstrated the technique by implementing a Bounds Check Bypass detection tool called SpecFuzz. During the evaluation, the tool has proven to be more effective at finding vulnerabilities than the available static analysis tools (in total, 2055 found by SpecFuzz, whereas RH Scanner found 38).

At the same time, the additional dynamic information we get from SpecFuzz has shown us that realistic BCB vulnerabilities are much less common than we initially anticipated. Among the tested libraries, SpecFuzz found only 10 realistically exploitable BCB vulnerabilities. To make use of this observation, we developed a simple analysis heuristic that allowed us to automatically filter out the low-risk vulnerabilities. Thanks to it, the performance cost of SpecFuzz-based patches was considerably lower (at most 12%) compared to the slowdown caused by conservative defences such as Speculative Load Hardening (80% on average).

References

  • [1] Checkpoint/Restore In Userspace. . http://criu.org/. Accessed: May, 2019.
  • [2] Speculative Load Hardening: A Spectre Variant 1 Mitigation Technique . https://docs.google.com/document/d/1wwcfv3UV9ZnZVcGiGuoITT_61e_Ko3TmoCS3uXLcJR0/edit#heading=h.phdehs44eom6, 2018. Accessed: May, 2019.
  • [3] SUSE Security update for kernel-firmware. https://www.suse.com/de-de/support/update/announcement/2018/suse-su-20180008-1/, 2018. Accessed: May, 2019.
  • [4] BearSSL . https://bearssl.org/, 2019. Accessed: May, 2019.
  • [5] Brotli . https://brotli.org/, 2019. Accessed: May, 2019.
  • [6] JSMN . https://github.com/zserge/jsmn, 2019. Accessed: May, 2019.
  • [7] LibHTP . https://github.com/OISF/libhtp, 2019. Accessed: May, 2019.
  • [8] libyaml . https://pyyaml.org/wiki/LibYAML, 2019. Accessed: May, 2019.
  • [9] Honggfuzz. http://honggfuzz.com/, 2019. Accessed: May, 2019.
  • [10] libtomcrypt. https://www.libtom.net/, 2019. Accessed: May, 2019.
  • [11] Intel Corp . Analysis of Speculative Execution Side Channels . White Paper, 2018.
  • [12] Intel Corp . Speculative execution side channel mitigations. White Paper, 2018.
  • [13] Intel Corporation . Intel ® 64 and IA-32 Architectures Software Developer’s Manual . 2018.
  • [14] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F Wenisch, Yuval Yarom, and Raoul Strackx. FORESHADOW: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution . In Usenix Security, 2018.
  • [15] Nick Clifton. SPECTRE Variant 1 scanning tool . https://access.redhat.com/blogs/766093/posts/3510331, 2018. Accessed: May, 2019.
  • [16] Qian Ge, Yuval Yarom, Tom Chothia, and Gernot Heiser. Time protection: the missing os abstraction. In Proceedings of the EuroSys Conference, 2019.
  • [17] Google. More details about mitigations for the cpu speculative execution issue. https://security.googleblog.com/2018/01/more-details-about-mitigations-for-cpu_4.html, 2018. Accessed: May, 2019.
  • [18] Project Zero Google. Speculative Execution, Variant 4: Speculative Store Bypass . https://bugs.chromium.org/p/project-zero/issues/detail?id=1528, 2018. Accessed: May, 2019.
  • [19] Daniel Gruss, Julian Lettner, Felix Schuster, Olga Ohrimenko, Istvan Haller, and Manuel Costa. Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory . In Usenix Security, 2017.
  • [20] Noam Hadad and Jonathan Afek. Overcoming (some) Spectre browser mitigations . https://alephsecurity.com/2018/06/26/spectre-browser-query-cache/, 2018. Accessed: May, 2019.
  • [21] Open Source Security Inc. Respectre: The State of the Art in Spectre Defenses . https://www.grsecurity.net/respectre_announce.php, 2018. Accessed: May, 2019.
  • [22] Reinders James. Intel Process Trace . https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing, 2013. Accessed: May, 2019.
  • [23] Paul Kocher. Spectre Mitigations in Microsoft’s C/C++ Compiler . https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html, 2018. Accessed: May, 2019.
  • [24] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre Attacks: Exploiting Speculative Execution . arXiv preprint arXiv:1801.01203v1, 2018.
  • [25] Esmaeil Mohammadian Koruyeh, Khaled Khasawneh, Chengyu Song, and Nael Abu-Ghazaleh. Spectre Returns! Speculation Attacks using the Return Stack Buffer . 2018.
  • [26] Chris Lattner and Vikram Adve. LLVM: a compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2004.
  • [27] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown . arXiv preprint arXiv:1801.01207, 2018.
  • [28] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. PIN : building customized program analysis tools with dynamic instrumentation. In ACM Sigplan Notices , 2005.
  • [29] Giorgi Maisuradze and Christian Rossow. ret2spec: Speculative Execution Using Return Stack Buffers . In CCS, 2018.
  • [30] Microsoft. Msvc compiler reference: /qspectre. https://docs.microsoft.com/en-us/cpp/build/reference/qspectre?view=vs-2019, 2018. Accessed: May, 2019.
  • [31] Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. SoftBound : Highly compatible and complete spatial memory safety for C . In Proceedings of the 30th Conference on Programming Language Design and Implementation (PLDI), 2009.
  • [32] Oleksii Oleksenko, Bohdan Trach, Robert Krahn, Andre Martin, Mark Silberstein, and Christof Fetzer. Varys: Protecting SGX enclaves from practical side-channel attacks. In USENIX Annual Technical Conference ( USENIX ATC ), 2018.
  • [33] Oleksii Oleksenko, Bohdan Trach, Tobias Reiher, Mark Silberstein, and Christof Fetzer. You Shall Not Bypass : Employing data dependencies to prevent bounds check bypass. arXiv preprint arXiv:1805.08506, 2018.
  • [34] The Chromium Projects. Site Isolation . http://www.chromium.org/Home/chromium-security/site-isolation, 2018. Accessed: May, 2019.
  • [35] Michael Schwarz, Martin Schwarzl, Moritz Lipp, and Daniel Gruss. Netspectre: Read arbitrary memory over network. 2018.
  • [36] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. AddressSanitizer: a fast address sanity checker. In Proceedings of the 2012 Usenix ATC’12, 2012.
  • [37] Jicheng Shi, Xiang Song, Haibo Chen, and Binyu Zang. Limiting cache-based side-channel in multi-tenant cloud using dynamic page coloring. In International Conference on Dependable Systems and Networks Workshops (DSN-W), 2011.
  • [38] Mark Silberstein, Oleksii Oleksenko, and Christof Fetzer. Speculating about speculation: on the (lack of) security guarantees of spectre-v1 mitigations. https://www.sigarch.org/speculating-about-speculation-on-the-lack-of-security-guarantees-of-spectre-v1-mitigations/, 2018. Accessed: May, 2019.
  • [39] Read Sprabery, Konstantin Evchenko, Abhilash Raj, Rakesh B. Bobba, Sibin Mohan, and Roy Campbell. Scheduling, Isolation, and Cache Allocation: A Side-channel Defense . In IEEE International Conference on Cloud Engineering, 2018.
  • [40] E. Tromer, D.A. Osvik, and A. Shamir. Efficient cache attacks on AES , and countermeasures. Journal of Cryptology, 2010.
  • [41] Venkatanathan Varadarajan, Thomas Ristenpart, and Michael Swift. Scheduler-based Defenses against Cross-VM Side-channels . In USENIX Security Symposium , 2014.
  • [42] Luke Wagner. Mozilla Security Blog: Mitigations landing for new class of timing attack | . https://blog.mozilla.org/security/2018/01/03/mitigations-landing-new-class-timing-attack/, 2018. Accessed: May, 2018.
  • [43] Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, and Abhik Roychoudhury. oo7: Low-overhead Defense against Spectre Attacks . 2018.
  • [44] Y. Yarom and K. Falkner. Flush+Reload : A high resolution, low noise, L3 cache side-channel attack. In USENIX Security Symposium, 2014.
  • [45] Andreas Zeller, Rahul Gopinath, Marcel B  ö hme, Gordon Fraser, and Christian Holler. Generating software tests. In Generating Software Tests. Saarland University, 2019. Accessed: May, 2019.

Appendix A Example of an Aggregated Trace

// Format: { Offending instruction :
                 [Mispredicted branches],
                 [Accessed addresses] }
{ Run1: {
    0x440006 :
        [0x531f14, 0x532b08],
        [107820859003904,107820859004032,10782085900...],
    0x470006 :
        [0x531f14, 0x532074],
        [107614700568644,107614700571668,10782085900...],
    0x532836:
        [0x53251f, 0x5326b5, 0x532ba9],
        [5839584],
    0x5424c1:
        [0x5423a7, 0x542408],
        [105690555219985,105690555219986,10569055521...],
    0x542864:
        [0x542683, 0x542743, 0x5427a9],
        [105690555219985,105690555219986,10569055521...],
    0x550007:
        [0x531f14, 0x532074],
        [107614700571612,107820859003940,10788957847...],
    ...},
Run2: {
    0x440006 :
        [0x531f14, 0x532074, 0x532b08],
        [107820859003904,107820859004032,10782085900...]
    0x532969:
        [0x5326b5, 0x532774, 0x5327b0],
        [107889578574388],
    0x5329c1:
        [0x5322b3, 0x5326b5, 0x532774],
        [5839584],
    0x5424c1:
        [0x5423a7, 0x542408],
        [105690555219986,105690555219987...],
    0x542864:
        [0x542683, 0x542743, 0x5427a9],
        [105690555219985,105690555219987,105690555219988],
    0x550007:
        [0x531f14, 0x532074],
        [107614700571612,107820859003940,107889578475616],
    ...},
...
}

3 Speculation Exposure

Speculative vulnerabilities are notoriously hard to find because hardware strives to hide the effects of speculative execution from software, making it impossible to detect such vulnerabilities with conventional testing methods. In this paper, we approach the problem by simulating the unsafe hardware optimization in software to uncover the speculative vulnerabilities.

To understand how we construct the simulation, first consider how speculative execution is implemented in hardware (2.1). When a hazard appears (e.g., at a conditional or an indirect jump), the CPU \⃝raisebox{-0.9pt}{1} makes a prediction on its outcome, \⃝raisebox{-0.9pt}{2} executes the predicted path while keeping the results in a temporary storage, \⃝raisebox{-0.9pt}{3} eventually eliminates the hazard and either commits the results (correct prediction) or discards them (wrong prediction), and proceeds with the correct path.

For example, in Figure 1, the CPU might make a wrong prediction that BB1 (Basic Block 1) will proceed into BB3. Then, it will start executing BB3, BB4, and further. When the hazard is eliminated, the CPU determines that the prediction was wrong and discards all changes made by the speculated instructions. Afterward, it proceeds to execute the correct path starting from BB2.

Figure 1: Example of speculative execution. Due to a misprediction, the program executes basic blocks BB3 and BB4, then detects the mistake, discards the results, and continues execution starting from BB2.

We can simulate this behavior with a "checkpoint-mispredict-rollback" scheme: At a potential hazard, we take a checkpoint of the current process state. Then, we diverge the control flow into a wrong (mispredicted) path and start executing it. When the maximal possible length of speculative execution is reached or if we encounter a serializing instruction, we rollback to the checkpoint and proceed with normal execution. The pattern can be applied to data hazards too: Instead of diverging the control flow, we would replace a memory/register value with a mispredicted one.

This basic mechanism simulates the worst case scenario with a CPU that always makes a wrong prediction and always speculates to the longest possible depth. Such a pessimistic approach makes the testing results universally applicable to any CPU model and any execution conditions. Moreover, it also covers all possible combinations of correct and incorrect predictions that could happen at runtime.

Checkpointing. For storing the process state, we could use any of the existing checkpointing mechanisms, ranging from full-process checkpoint (e.g., CRIU [1]) to transactional memory techniques (e.g., Intel TSX [13]), although it is preferable to use light-weight, low-overhead mechanisms to reduce the testing time. We describe the checkpointing mechanism used in our implementation in 4.2.

Simulating misprediction. To simulate misprediction, we instrument the basic blocks in a way that will force control flow to enter the paths that the CPU would otherwise take speculatively. The nature of instrumentation would depend on the exact type of speculative execution attack being simulated.

Terminating simulation. The final question is, when do we terminate the simulation? In real hardware, the termination happens in the following cases:

  • At serializing instructions (e.g., LFENCE, CPUID).

  • When the maximum speculation depth is reached.

The first case is straightforward: We execute a rollback every time when we encounter a serialization instruction.

For the second case, we assume that the speculation depth is limited by the size of Reorder Buffer (ROB) as the speculation can proceed only as long as there is empty space in ROB. In modern CPUs, ROBs can fit no more than 250 microoperations (ops) (the largest we know is 224 entries, in Intel Skylake architecture). Because software does not have access to

op counters, we have to fall back to estimating the size as 250 machine instructions, which are easier to count. It is an overestimation because normally one instruction maps to at least one

op111The only exception is ops fusion, when CPU merges several instructions into one. However, it is a rare event that should not compromise the security guarantees..

3.1 Applying Simulation to Spectre Attacks

The simulation mechanism will depend on the specific vulnerability that we want to simulate. While we explain the instrumentation we have implemented for Branch Check Bypass in the later sections, below we give an overview of instrumentation that can be used for other Spectre-type attacks.

Branch Target Injection [24] is a Spectre variant targeting speculation at indirect jumps. When an indirect jump instruction is executed, the CPU speculates the jump target using the branch predictor without waiting for the actual target address computation to finish. The attacker can exploit this behavior by training the branch predictor to execute jump to a code snippet that would leak program data via a side channel.

SpecFuzz can be modified to simulate BTI by maintaining a software history buffer for every indirect branch in the application. Then, at an indirect branch, SpecFuzz would 1) record the current branch target into the history buffer and 2) run a simulation for every previously recorded target. This approach works, however, under the assumption that attacker can train the branch predictor only by providing data to the application and cannot inject arbitrary targets into the IBTB from another application on the same core.

Return Address Misprediction [29, 25] attack is a variant of Branch Target Injection. The CPU maintains a small number of most recently used return addresses in a dedicated cache, pushing the return address into this cache on each call instruction and popping it from the cache on each return instruction. When this cache becomes empty, the CPU will speculate the return address using the indirect Branch Target Buffer. To simulate this vulnerability, SpecFuzz can instrument call and return instructions to, correspondingly, increment and decrement a counter, jumping to an address from history buffer on return addresses with negative or zero counter value.

Speculative Store Bypass. [18] is a microarchitectural vulnerability caused by CPU ignoring the potential dependencies between load and store instructions during speculation. When a store operation is delayed, subsequent load from the same address may speculatively reuse to old value from the cache. To simulate this attack, SpecFuzz could be extended to start a simulation before every basic block that contains a write to memory. Then, we would skip the store during the simulation, but execute it after the rollback. If a basic block contains several stores, then we split it and execute the simulation for every store.

1 2 3 4if x < array_size: 5 6    result = array[x] 7    ...
(a) Native version
    checkpoint()     if x >= array_size:         goto skip_branch     if x < array_size skip_branch:     result = array[x]     ...     if max_depth_reached():         rollback() // to line 4
(b) Simulation of conditional branch misprediction
Figure 2: Example of the SpecFuzz instrumentation
Figure 3: Simulation of conditional branch mispredictions: The terminator condition is replaced by an inverse one. When the simulation is ended, the program returns to the normal flow.

4 SpecFuzz: Exposure of Bounds Check
Bypass

To showcase our approach on a specific vulnerability class, we develop a tool for simulating and detecting Bounds Check Bypass (BCB) [24]. We call the tool SpecFuzz.

BCB in its core contains a speculative out-of-bounds access caused by a misprediction of a conditional jump target (see 2.3). To detect the access, we have to implement two components: a simulation of the misprediction (3) and a mechanism for detecting invalid memory accesses during the speculation. The latter is relatively straightforward as we can use one of many existing memory safety techniques; in SpecFuzz, we used AddressSanitizer [36]. To implement the former, though, we need a custom technique.

To simulate conditional branch mispredictions, we create a modified (instrumented) version of the application which executes not only the normal control flow but also the paths that could be taken as a result of mispredictions. Consider the example on Figure 2. Before the conditional branch (line 4), we insert a call to a checkpointing function (line 1) that stores the current process state and initializes simulation. Then, we simulate a misprediction by inserting a branch statement with an inverted condition (line 2) and a jump into the body of the conditional block, thus skipping the original branch (line 3). We proceed with the execution until reaching a terminating condition: either the maximum speculation depth (line 8) or a serializing instruction (not present in the example). After that, we restore the process state to the previous checkpoint (line 9) and redirect the execution to the original branch statement.

We implement this design as a combination of an LLVM [26] backend pass for the x86 architecture and a runtime library.

Figure 4: The workflow of testing an application with SpecFuzz.

4.1 Simulating Branch Misprediction

SpecFuzz simulates mispredictions by forcing the application into taking a wrong branch at every conditional jump. We implement this behavior by replacing all conditional terminators in the program with the ones that have an inverted condition (see Figure 3). Now, when the original basic block (BB) would proceed into the successor , the modified terminator diverges the control flow into . The original terminator is moved into a separate BB, and the control flow returns to normal execution by rolling back into this BB after the simulation.

As a result, every time the program reaches this BB, it first executes the simulated path, then rolls back to the BB and continues with normal execution. We apply this instrumentation to all conditional branches.

4.2 Saving and Restoring Process State

The main requirement to the rollback mechanism used in SpecFuzz was to have low performance impact so that the testing time is kept short. To this end, we implemented a light-weight in-application mechanism that snapshots the CPU state before the simulation and records the memory changes during the simulation.

To store the CPU state, we add a call to the checkpointing function before every conditional jump. The function takes a snapshot of the register values (including GPRs, flags, SIMD, floating-point registers, etc.) and stores it into memory. During the rollback, we restore the register values based on the snapshot. The function also stores the address of the original conditional jump (i.e., original terminator) that we later use as a rollback address.

We could apply a similar mechanism to save the memory state, but this would have an unacceptable performance cost, especially considering that we would have to dump memory contents at every conditional jump. Instead, we log all writes to memory during the simulation. Before every instruction that modifies memory (e.g., mov, push, call), we store the address it modifies and its previous value. Then, to do a rollback, we go through the changes in the reverse order and restore the old values.

4.3 Terminating Simulation

As discussed in 3, we terminate the simulation either if we encounter a serializing event or when the maximum depth of speculation is reached.

To implement the first case, we simply invoke the rollback function before every serializing instruction. As serializing, we consider the following instructions:

  • The instructions listed as serializing in the Intel documentation [13], such as LFENCE or CPUID.

  • System calls. We assume that executing any system call takes longer than the maximum possible duration of speculative execution.

  • External function calls. By the virtue of being implemented as a compiler pass, SpecFuzz cannot correctly run the simulation beyond the instrumented code. Therefore, we have to consider all calls to external functions as serialization points, even though it is not necessarily a correct behavior. In 7, we discuss a potential solution to this problem.

For the second case, we count instructions at runtime. For this, we keep a global instruction counter and set it to zero when a simulation begins. At the beginning of every basic block, we add its length to the counter. (We know the length at compile time because SpecFuzz is a backend pass). When the counter value reaches 250 (maximum possible speculation depth, see 3), we invoke the rollback function.

4.4 Handling errors

Finally, with the simulation mechanism at hand, we have to correctly respond to detections of out-of-bounds accesses or to other error conditions that appear during the simulation. In contrast to normal, nonspeculative execution, the process does not crash if an error happens during the speculation. Instead, the CPU silences the error by discarding its effects when the misprediction is detected.

To simulate this behavior in SpecFuzz, we had to adapt the error response mechanism in AddressSanitizer (we rely on it for detecting out-of-bounds accesses). Normally, upon detecting a bounds violation, AddressSanitizer terminates the application and reports the error. We modified it to instead record the violation in a log and rollback to the previous checkpoint222A more correct response would be to log and continue simulation, although current implementation of SpecFuzz does not yet support it.. Accordingly, one test run might detect several errors. In practice, we observed up to hundreds of errors per single invocation.

Similarly, we have to recover from runtime faults. We register a custom signal handler that logs and rolls back after the signals that could be caused by an out-of-bounds access, such as SIGSEGV and SIGBUS. We also rollback after other faults (e.g., division by zero), but we do not record them in the log as they are irrelevant to the BCB vulnerability.

5 Fuzzing with SpecFuzz

Given the simulation technique described in the previous section (4), we can test applications with conventional dynamic testing methods, such as fuzzing. In our experiments, we used the workflow in Figure 4.

First, we compile the application under test with Clang and apply the SpecFuzz pass, thus producing an instrumented binary that simulates branch mispredictions. Second, we fuzz the binary. In our experiments, we used HonggFuzz [9], an evolutionary coverage-driven fuzzer, and we relied on Intel Processor Trace [22] for measuring code coverage. After fuzzing, we aggregate the traces and analyze the detected vulnerabilities.

5.1 Aggregation of results

As a result fuzzing, we get a trace of detected speculative out-of-bounds accesses. Each entry in the trace has a form of a tuple:

  (Accessed address; Offending instruction; Mispredicted branch)

Usually, the trace is long and may contain up to hundreds of detections per every test run. This happens because the simulation forces the application into doing wrong actions, which frequently leads to errors.

To make the trace usable, we aggregate the results per run and per instruction. That is, for every test run, we collect all the addresses that every unique offending instruction accessed as well as the addresses of branches, mispredictions of which triggered the execution of this instruction. Appendix A is an example of what we get from the aggregation.

5.2 Vulnerability analysis

After the aggregation, we have a list of vulnerabilities with an approximate range of addresses that each of them can access. As we will see in 6.2, the list may be rather verbose and contain up to multiple thousands of vulnerabilities. Yet, we argue that most of them are not realistically exploitable.

In many cases, the attacker does not have any control over the accessed address. This could happen, for example, when the application tries to speculatively dereference a field of an uninitialized structure. In this case, no matter what input we provide to the application, the speculative dereference will always go to the same address. This lack of control shifts the vulnerability into the category of conventional side-channel attacks [40, 44] that are less critical than BCB because they provide less information to the attacker and are harder to launch. Moreover, the defence strategies for these attacks are also different from BCB.

We identify these cases by analyzing the aggregated traces. We estimate the presence of the attacker’s control by comparing the accessed addresses in every run. If a given offending instruction always accessed the same set of addresses, we assume that the attacker does not have control over it. Note, however, that the heuristic is valid only after a large enough number of test runs.

6 Evaluation

In this section, we try to answer the following questions:

  • How effective is SpecFuzz at detecting BCB vulnerabilities?

  • Is it better at finding the vulnerabilities than the existing static analysis tools?

  • Does patching based on SpecFuzz results give us a performance improvement compared to full-application protection?

To put the results into a context, we compare SpecFuzz to two existing open-source projects: Spectre V1 Scanner (RH Scanner) [15]—a static analysis tool from RedHat, and Speculative Load Hardening (SLH) [2]—an LLVM pass that masks all speculative loads thus providing a conservative defence against BCB.

Instead of RH Scanner, we could have compared SpecFuzz with more advanced static analysis tools Respectre [21] and oo7 [43], but they are not freely available: Respectre is a commercial product and oo7 is provided only upon request. We did not manage to get access to oo7 and the comparison with Respectre is still being arranged with the authors.

Applications. For the evaluation, we tested six commonly used libraries. The libraries include two cryptographic functions (AES from libTomCrypt [10] and RSA from BearSSL [4]), a compression algorithm (Brotli [5]), and three parsing libraries, JSON (JSMN [6]), HTTP [7], and YAML [8]. We picked these specific libraries because they may directly process unsanitized user input from the network, giving the attacker better chances of controlling memory accesses within the libraries.

Testbed. We ran all the experiments on a 4-core (8 hyper-threads) Intel Core i7 CPU operating at 3.4 GHz (Skylake microarchitecture) with 32 KB L1 and 256 KB L2 private caches, an 8 MB L3 shared cache, and 32 GB of RAM. The machine was running Linux kernel 4.16.

6.1 Detection of BCB Gadgets

MSVC RH Scanner SLH SpecFuzz Total
2 12 15 15 15
Table 1: The number of basic BCB variants detected by different mitigation tools.

With the first experiment, we want to show that SpecFuzz is effective at detecting different variations of BCB. To this end, we tested 15 sample BCB variants created by Paul Kocher [23] which represent 15 different ways BCB may occur in C code. The variants were originally designed to illustrate the shortcomings of the BCB mitigation mechanism in MSVC [30], but they can serve as a good starting point for evaluating effectiveness of any BCB detection tool. Note, however, that the suite is not exhaustive and it does not represent all possible variants of BCB; rather, it evaluates the basic detection capabilities of a tool.

The testing results are presented in Table 1. As expected, the simulation in SpecFuzz works correctly and surfaces all speculative out-of-bounds accesses, which are then detected by AddressSanitizer. As of the other tools, the original article [23] reported that the MSVC pass detects only 2 variants333The result may be outdated. We did not test the newer versions of the pass and it may have improved since the publication of the article.

. It happens because it relies on simple pattern matching, that is, it searches for specific patterns in the binary. Accordingly, if the vulnerability happens to take a form not envisioned by the developers, the compiler will not protect it. The same goes for RH Scanner, although it relies on more generic patterns and thus, detects more variants. SLH does not attempt to find the vulnerabilities and instead protects all conditional branches in the application, which guarantees complete coverage.

6.2 Fuzzing results

AES RSA Brotli JSMN HTTP YAML Total
SpecFuzz Total 3 16 1909 17 15 95 2055
Controlled 0 2 980 2 3 25 1012
In C 0 2 52 2 3 19 78
Verified 0 0 6 (3) 2 2 0 10

 

RH Scanner Total 0 13 (3 new) 5 1 1 18 38
True positive 0 8 4 1 1 4 18
Controlled 0 0 0 0 0 0 0

 

SLH: instrumented branches 21 175 253 83 452 3398 4382
Table 2: Vulnerabilities found by SpecFuzz, RH Scanner, and the total number of branches instrumented by SLH.

Next, we wanted to see how effective SpecFuzz is at detecting vulnerabilities in the wild. To this end, we fuzzed six real-world libraries. The fuzzing of each application lasted for 24 hours. Before running each experiment, we fuzzed a native (i.e., not instrumented) version of the application to create an initial input corpus. This way, we ensured large coverage (see 7 for discussion of the coverage issue).

The results are presented in Table 2. The first row (Total) represents the total number of vulnerabilities detected by SpecFuzz. There is a vast difference between the results, ranging from almost 2000 vulnerabilities found in Brotli to only 3 found in the AES code. One reason for it is the code type: AES and RSA are cryptographic functions that are written with side-channel attacks in mind. They strive to avoid branching on the input, which reduces the opportunities for BCB. Another factor is the sheer code size: Brotli has ~9000 LOC while JSMN has less than 400 LOC.

For most of the vulnerabilities, however, we did not observe any correlation between the input and the accessed address, which puts them into the low-risk category (see 5.2). The second row (Controlled) is what is left after filtering them out. The number becomes even lower if we convert them into locations in the C code (the third row, In C). As SpecFuzz works with binaries, a single vulnerability in C may be reported several times due to compiler optimizations, such as loop unrolling.

Finally, we manually checked the results. The fourth row (Verified) are realistic vulnerabilities that have to be patched. The difference between the rows 3 and 4 is caused by the fact that AddressSanitizer does not provide precise information about the overflows and SpecFuzz sometimes falsely marks them as Controlled (see 7). In a few cases (in brackets), we were not able to find out if the attacker has control over the accessed address.

For comparison, we also tested the applications with RH Scanner (rows 5 and 6). It detected much fewer vulnerabilities than SpecFuzz and in none of them the attacker has control over the accessed address.

Interestingly, RH Scanner found 3 low-risk vulnerabilities in RSA that were not triggered by fuzzing. After manual inspection, we found out that SpecFuzz did not detect them because they require nested misprediction to be triggered, which our implementation does not yet support.

6.3 Performance impact of the patches

In this experiment, we wanted to see whether patching based on the fuzzing results gives us better performance compared to full-application instrumentation, such as the one implemented by SLH. To evaluate it, we manually patched the vulnerabilities found in the previous experiment. We patched only the controlled vulnerabilities because we assume that a vulnerability is not realistically exploitable if the attacker does not have any control over the accessed address (see 5.2). To prevent the vulnerabilities, we added an LFENCE instruction after every risky branch. Then, we measured the runtimes of the patched applications and compared them to the overheads of SLH protection. Every measurement was repeated 100 times and then averaged.

The results are presented in Figure 5. As we can see, the overhead is significantly lower compared to the one introduces by SLH. In three cases—AES, RSA, and the YAML parser—no patch was necessary as we did not find any controllable vulnerabilities. In the other cases, the patches introduced only up to 12% overhead.

Such a drastic difference stems from the fact that we patch only a small subset of the branches in the program. If we look at the total numbers (Table 2), SLH instrumented 4382 branches across all the libraries, while SpecFuzz allowed us to reduces the number to only 10 realistic vulnerabilities.

One interesting outlier is JSMN which experienced ~500% slowdown with SLH. The outlier is caused by an extremely high density of branches in the application (approximately one branch executed every cycle) and, thus, high reliance on branch prediction to efficiently utilize instruction parallelism. SLH effectively disables this optimization and makes the execution much more sequential. At the same time, SpecFuzz found only two high-risk vulnerabilities in JSMN and patching them introduced only ~1% slowdown.

Note, however, that the results of this experiment mainly have an illustrative purpose. Because SpecFuzz and SLH represent different classes of defence tools, they have different security guarantees and their cost cannot be compared directly. While SLH is a conservative technique that guarantees complete absence of unsafe speculative loads, SpecFuzz is a testing tool that may miss certain vulnerabilities (see 7). Accordingly, they should be applied in different situations: SLH—when security is critical, and SpecFuzz—when performance is the first priority.

Figure 5: Performance overheads of the patches based on SpecFuzz compared to the slowdown introduced by SLH.

6.4 Case Study: libHTP

To showcase the fuzzing process on a specific example, this section will present the procedure of fuzzing an HTTP parsing library libHTP [7] and the vulnerabilities that we found in it. We will also examine one realistic and potentially exploitable vulnerability that SpecFuzz discovered in the library.

We picked libHTP as a case study because it is a security-critical library that could be used in a wide spectrum of web applications and a flaw in libHTP can undermine the security of any project that relies on it. Moreover, it is a security-aware parser, regularly tested and fuzzed for traditional bugs (e.g., memory safety violations), which makes it unlikely to be exploitable through conventional methods.

In this experiment, we built the library with our modified compiler and ran the fuzzers available in the project’s repository. As previously, we used HonggFuzz [9] and ran the experiment for 24 hours. Thanks to the sample inputs shipped with the project, the coverage was large from the very beginning.

Fuzzing results. In total, SpecFuzz detected 3322 unique speculative out-of-bounds accesses. Most of them are low-risk bounds violations in which the attacker cannot control the accessed address. One such example is presented in Figure LABEL:fig:htp-low-risk. Here, the CPU might mispredict the branch on line 5 and execute the dereference on line 1 (i.e., ((*(X)).realptr) even though the pointer b is NULL. The invalid address will be always to the same address, which is why we consider this vulnerability benign.

1#define bstr_ptr(X) ( ((*(X)).realptr == NULL) \
2       ? ((unsigned char *)(X) + sizeof(bstr)) \
3        : (unsigned char *)(*(X)).realptr )
4...
5if (b == NULL) return NULL;
6return bstr_util_memdup_to_c(bstr_ptr(b), bstr_len(b));\end{lstlisting}
7    \caption{Example of a low-risk speculative out-of-bounds access in libHTP.}
8    \label{fig:htp-low-risk}
9\end{figure}
10
11Using the analysis mechanism presented in \subsecref{analysis}, we automatically filtered these cases out.
12It reduced the number to 148 locations in binary or to 99 in C\@.
13The number of code locations is smaller because one line in C can be compiled into several locations in binary (e.g., due to loop unrolling) and a single vulnerability will appear in several places.
14
15Yet, even 99 is not the final number.
16As described in \secref{discussion}, the automatic analysis sometimes mistakenly marks a vulnerability as controllable.
17For example, in \figref{htp-analysis-mistake}, \sys{} detects an out-of-bounds access on line 3 (\code{data[i]}).
18A misprediction of the \code{while} condition (line 2) may cause a speculative execution of a few more loop iterations than necessary, leading to accesses to the addresses beyond the array’s end.
19The size of the array may differ from one input to another and the overflow addresses will also differ.
20This leads the analysis tool into falsely marking the vulnerability as controllable.
21
22\begin{figure}[t]
23    \begin{lstlisting}[frame=tb]
24size_t i = 0;
25while (i < len) {
26    data[i] = tolower(data[i]);
27    i++;
28}\end{lstlisting}
29    \caption{Example of a vulnerability mistakenly labeled as controllable.}
30    \label{fig:htp-analysis-mistake}
31\end{figure}
32
33\myparagraph{Found vulnerabilities}
34After we manual inspected the code (approximately 2 hours worth of work) and removed these false labelings, we were left out with only 3 controllable vulnerabilities\footnotemark{}.
35One of them (see \figref{htp-base64}) is realistically exploitable if the attacker can accurately monitor the cache state.
36Here, the function \code{base64\_decode\_single} decodes a Base64 encoded symbol by looking it up in a table of precomputed values (array \code{decoding}, lines 2--3).
37Before fetching the decoded symbol, the function checks the value for over- and underflows.
38The attacker can bypass the check by training the branch predictor and, thus, triggering a speculative overread on line 7.
39
40\footnotetext{The vulnerabilities were submitted to the library developers and are currently under review.}
41
42There are two properties of the code snippet that make the vulnerability realistically exploitable.
43First, the attacker has complete control over the accessed address because the array index (\code{value\_in}) is a part of the HTTP request, that is, a part of the program input.
44Second, the fetched value is further used for defining the control flow of the program (see the comparison on line 16), which allows the attacker to infer a part of the value (specifically, its sign) by observing the cache state.
45
46The attacker could execute the attack as follows.
47She begins by sending a probing message to find out which cache line the first element of the array \code{decoding} uses.
48Then, she sends a valid message to train the branch predictor on predicting the bounds check (line 5) as true.
49Finally, she resets the cache state (e.g., flushes the cache) and sends a message that contains a symbol that triggers an overread, followed by a symbol that triggers a read from the first array element.
50If the read value is negative, the loop will do one more iteration, execute the second read, and the attacker will see a change in the state of the corresponding cache line.
51Otherwise, the loop will be terminated and the state will not change.
52
53In summary, this vulnerability is a speculative buffer overread, which allows reading data within the range of 175 bytes beyond the bounds of the array \code{decoding} and leaks the sign of the byte it reads.
54
55The other two vulnerabilities are less realistic as they require the application to misuse the library’s interface.
56Still, it is better to protect them, especially considering the low runtime cost of the patch (see next).
57
58\myparagraph{Patch}
59To patch the vulnerabilities, we added an \code{LFENCE} instruction before every potentially unsafe load.
60According to our measurements, the patch changed the request processing time by 6.5\%, from 920 milliseconds to 980 milliseconds (averaged over 10M requests).
61For comparison, applying SLH changed the time by 9.7\%, to 1010 milliseconds.
62
63
64\begin{figure}[t]
65    \begin{lstlisting}[frame=tb]
66int base64_decode_single(signed char value_in) {
67  static signed char decoding[] =
68    {62, -1, ...}; // 80 elements
69  value_in -= 43;
70  if ((value_in < 0) || (value_in > decoding_size - 1))
71    return -1;
72  return decoding[(int) value_in];
73}
74...
75int htp_base64_decode(const void *code_in, ...) {
76  signed char fragment;
77  ...
78  do {
79    ...
80    fragment = htp_base64_decode_single(*code_in++);
81  } while (fragment < 0);
82...}\end{lstlisting}
83    \caption{A realistic BCB vulnerability in the Base64 decoding function that was found by \sys{}.}
84    \label{fig:htp-base64}
85\end{figure}

7 Limitations

In this section, we discuss the limitations of SpecFuzz that we did not envision (or underestimated) while designing the tool. All of them, however, are only implementation flaws and can be fixed in the future without redesigning the system.

Nested misprediction. One important feature that our implementation does not yet support is simulation of nested mispredictions, that is, simulation of the situations when a branch is mispredicted while speculative execution is already running. As the evaluation has shown (6.1), some vulnerabilities indeed require misprediction of multiple branches in a row to get triggered, although this appears infrequently.

To implement the nested simulation, we would have to maintain a stack of checkpoints instead of a single checkpoint: Every time we start a new simulation, we push the checkpoint on the stack, and every rollback restores the topmost checkpoint. We plan to implement this feature in the future.

Mislabeling due to incomplete information. While doing the evaluation, we discovered that our vulnerability analysis technique (see 5.2) sometimes gives a false result and mistakenly labels an uncontrolled vulnerability as a controlled one. It is not a big problem because the cost of patching is generally low, but it may introduce unnecessary overhead.

The reason for the mislabeling is as follows. At a detected bounds violation, AddressSanitizer reports only the accessed address and not the distance between the address and the referent object bounds (e.g., buffer overflow size). Therefore, if the object size differs among the test runs, the accessed address will also be different, even if the distance is the same. And because now the same instruction accesses different addresses for different inputs, the analysis falsely labels it as a controllable vulnerability.

For example, one common case of mislabeling that we encountered is off-by-one accesses. If an array is read in a loop, our simulation will force the loop to take a few additional iterations and read a few elements beyond the array’s bound. Here, the attacker has no control over the accessed addresses, but if the array size differs from one test run to another, the addresses accessed in these extra iterations will also be different. The analysis would mark this vulnerability as controllable.

So far, we handle these cases by manually analyzing the code. A better solution would be to use a more complete memory safety technique (e.g., Intel MPX [13]) that maintains metadata about referent objects. That would allow us to filter by changes in the distance to the object bound instead of changes in the accessed address. Unfortunately, none of such techniques is supported by Clang out-of-the-box. To resolve this issue, we would have to implement the support or migrate SpecFuzz to another compiler.

Mislabeling due to masking. Mislabeling may also happen if two or more transient out-of-bounds accesses happen in a row. In the current implementation, SpecFuzz rolls back immediately upon detection of an out-of-bounds access or after receiving a fault. Real hardware, however, does not behave this way and instead proceeds with the speculative execution.

This discrepancy between the simulation and the real behavior may lead to a wrong classification of a vulnerability if an uncontrolled access is followed by a controlled one. For example, if a null dereference precedes a controlled buffer overflow, the simulation will always rollback at the dereference and the overflow will never be triggered. Accordingly, the traces will show that misprediction of this branch always causes an access to the same address (i.e., to 0), and the vulnerability will be mistakenly labeled as uncontrolled.

This issue can be fixed by continuing the simulation (instead of rolling back) after recording the error. This, however, would often cause recursive faults, and our current implementation cannot yet handle them properly.

Fuzzing coverage. In our evaluation, we used IPT [22] for measuring coverage. The simulation of mispredictions, however, makes the measurement artificially inflated because it adds the speculative paths that do not belong to normal program execution. Currently, to compensate this effect, we run a preliminary fuzzing round with noninstrumented binary that creates an initial extensive input corpus. In the future, we could implement a custom coverage mechanism that would ignore the simulation.

Binary instrumentation. Instead of creating a compiler pass, we could have implemented SpecFuzz as a binary instrumentation (e.g., with PIN [28]). This would allow us to patch binaries directly, without having to traverse the vulnerability back to the source code, thus making the patching process simpler and, in some cases, the patches would introduce less overhead. Moreover, by using this approach we would be able to correctly support dynamic linking and would not require access to the source code. However, the binary instrumentation tools are normally heavy-weight and it would considerably increase the time required for testing.

8 Related Work

Finally, let us take a look at the existing alternatives to SpecFuzz. Essentially, BCB is a combination of three vulnerabilities: a conditional branch misprediction, a memory safety violation, and a side-channel leakage. Accordingly, there are three types of defences that eliminate at least one of the vulnerabilities.

8.1 Preventing unsafe speculation

The most radical solution is to disable branch prediction entirely [3], although not all processors support it. Alternatively, the speculation can be disabled on a per-branch basis by adding serialising instructions, such as LFENCE on Intel CPUs or DSBSY on ARM. This approach, however, makes the effectiveness of the CPU utilization much lower and causes a large slowdown, over 400% in certain cases [33].

A more effective conservative defence technique is to add a data dependency between the conditional branch and the potentially invalid memory access. This way, most of the instructions can benefit from branch prediction and only the memory accesses are delayed. This approach is used, for example, in Speculative Load Hardening [2]. Although it provides a performance improvement over full serialization, the performance overhead is still considerable.

To avoid the high performance cost, we could patch only those vulnerabilities that seem to be exploitable. This is the idea behind the static analysis tools like Spectre 1 Scanner from RedHat [15] and MSCV Spectre 1 pass [30]. They analyse the binary and search for the instruction sequences that resemble a BCB vulnerability. Usually, they are searching for a variation of the pattern branch->load->load/write. However, using this pattern in a generic form would lead to marking all the branches as vulnerable, which is counter to the goal of the analysis. Therefore, most of the tools rely on specific variants of the pattern. It makes them inherently incomplete and restricted to the variants envisioned by the tool developers.

A more advanced analysis technique is used in oo7 [43]. It relies on static taint analysis to detect the memory accesses that are dependent on the program input. (This is the same criteria that we used to identify low-risk vulnerabilities.) Specifically, oo7 searches for the following pattern: A conditional branch with a condition dependent on the input (i.e., tainted) is followed by a load dependent on the condition, followed by memory access dependent on the load. Even though this approach is more reliable and generic than the simple pattern-matching techniques, it is affected by the inherent problems of static taint analysis. Namely, limited analysis depth may cause false positives and overtainting causes false negatives.

Respectre [21] is another analysis tool that claims to have better coverage than the existing alternatives. However, it is a commercial product and we can neither verify the claims nor compare it to SpecFuzz.

8.2 Preventing memory safety violations

Classical memory safety techniques (e.g., Intel MPX [13], SoftBound [31]) do not protect from BCB as the bounds checks they add can be mispredicted. Yet, they can be retrofitted to disable unsafe accesses even in the speculated paths.

A variant of this approach—pointer clipping—is now used in JavaScript engines [42] where, before accessing an array element, the index is masked with the array size. Because masking is an arithmetic operation, it does not create a control hazard and is not predicted by the CPU. However, this defence is vulnerable to the attacks where the data type is mispredicted and a wrong mask is used [20].

8.3 Preventing side channels

Another approach is to allow the transient accesses, but eliminate the possibility of leaking their results through a side channel.

In practice, web browsers achieve it by reducing the resolution of timers [42], disabling shared memory or by using site isolation [34]. However, these techniques prevent only cross-site attacks, and do not work at the presence of a local attacker.

The most common target for side-channel attacks is CPU caches, and many of the practical attacks can be prevented by defending this target. There is an extensive body of research in the direction of preventing side channel attacks, ranging from cache isolation [39], to attack detection [19], enforcing non-interrupted execution [41, 32], and cache coloring [37]. Yet, they provide only a partial defence as transient execution attacks may use other side-channel targets too [35].

Finally, an isolated execution environment can be achieved with a specialized microkernel [16], but it requires a complete redesign of the system.

9 Conclusion

We presented a technique to make speculative execution vulnerabilities visible by simulating them in software. We demonstrated the technique by implementing a Bounds Check Bypass detection tool called SpecFuzz. During the evaluation, the tool has proven to be more effective at finding vulnerabilities than the available static analysis tools (in total, 2055 found by SpecFuzz, whereas RH Scanner found 38).

At the same time, the additional dynamic information we get from SpecFuzz has shown us that realistic BCB vulnerabilities are much less common than we initially anticipated. Among the tested libraries, SpecFuzz found only 10 realistically exploitable BCB vulnerabilities. To make use of this observation, we developed a simple analysis heuristic that allowed us to automatically filter out the low-risk vulnerabilities. Thanks to it, the performance cost of SpecFuzz-based patches was considerably lower (at most 12%) compared to the slowdown caused by conservative defences such as Speculative Load Hardening (80% on average).

References

  • [1] Checkpoint/Restore In Userspace. . http://criu.org/. Accessed: May, 2019.
  • [2] Speculative Load Hardening: A Spectre Variant 1 Mitigation Technique . https://docs.google.com/document/d/1wwcfv3UV9ZnZVcGiGuoITT_61e_Ko3TmoCS3uXLcJR0/edit#heading=h.phdehs44eom6, 2018. Accessed: May, 2019.
  • [3] SUSE Security update for kernel-firmware. https://www.suse.com/de-de/support/update/announcement/2018/suse-su-20180008-1/, 2018. Accessed: May, 2019.
  • [4] BearSSL . https://bearssl.org/, 2019. Accessed: May, 2019.
  • [5] Brotli . https://brotli.org/, 2019. Accessed: May, 2019.
  • [6] JSMN . https://github.com/zserge/jsmn, 2019. Accessed: May, 2019.
  • [7] LibHTP . https://github.com/OISF/libhtp, 2019. Accessed: May, 2019.
  • [8] libyaml . https://pyyaml.org/wiki/LibYAML, 2019. Accessed: May, 2019.
  • [9] Honggfuzz. http://honggfuzz.com/, 2019. Accessed: May, 2019.
  • [10] libtomcrypt. https://www.libtom.net/, 2019. Accessed: May, 2019.
  • [11] Intel Corp . Analysis of Speculative Execution Side Channels . White Paper, 2018.
  • [12] Intel Corp . Speculative execution side channel mitigations. White Paper, 2018.
  • [13] Intel Corporation . Intel ® 64 and IA-32 Architectures Software Developer’s Manual . 2018.
  • [14] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F Wenisch, Yuval Yarom, and Raoul Strackx. FORESHADOW: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution . In Usenix Security, 2018.
  • [15] Nick Clifton. SPECTRE Variant 1 scanning tool . https://access.redhat.com/blogs/766093/posts/3510331, 2018. Accessed: May, 2019.
  • [16] Qian Ge, Yuval Yarom, Tom Chothia, and Gernot Heiser. Time protection: the missing os abstraction. In Proceedings of the EuroSys Conference, 2019.
  • [17] Google. More details about mitigations for the cpu speculative execution issue. https://security.googleblog.com/2018/01/more-details-about-mitigations-for-cpu_4.html, 2018. Accessed: May, 2019.
  • [18] Project Zero Google. Speculative Execution, Variant 4: Speculative Store Bypass . https://bugs.chromium.org/p/project-zero/issues/detail?id=1528, 2018. Accessed: May, 2019.
  • [19] Daniel Gruss, Julian Lettner, Felix Schuster, Olga Ohrimenko, Istvan Haller, and Manuel Costa. Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory . In Usenix Security, 2017.
  • [20] Noam Hadad and Jonathan Afek. Overcoming (some) Spectre browser mitigations . https://alephsecurity.com/2018/06/26/spectre-browser-query-cache/, 2018. Accessed: May, 2019.
  • [21] Open Source Security Inc. Respectre: The State of the Art in Spectre Defenses . https://www.grsecurity.net/respectre_announce.php, 2018. Accessed: May, 2019.
  • [22] Reinders James. Intel Process Trace . https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing, 2013. Accessed: May, 2019.
  • [23] Paul Kocher. Spectre Mitigations in Microsoft’s C/C++ Compiler . https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html, 2018. Accessed: May, 2019.
  • [24] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre Attacks: Exploiting Speculative Execution . arXiv preprint arXiv:1801.01203v1, 2018.
  • [25] Esmaeil Mohammadian Koruyeh, Khaled Khasawneh, Chengyu Song, and Nael Abu-Ghazaleh. Spectre Returns! Speculation Attacks using the Return Stack Buffer . 2018.
  • [26] Chris Lattner and Vikram Adve. LLVM: a compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2004.
  • [27] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown . arXiv preprint arXiv:1801.01207, 2018.
  • [28] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. PIN : building customized program analysis tools with dynamic instrumentation. In ACM Sigplan Notices , 2005.
  • [29] Giorgi Maisuradze and Christian Rossow. ret2spec: Speculative Execution Using Return Stack Buffers . In CCS, 2018.
  • [30] Microsoft. Msvc compiler reference: /qspectre. https://docs.microsoft.com/en-us/cpp/build/reference/qspectre?view=vs-2019, 2018. Accessed: May, 2019.
  • [31] Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. SoftBound : Highly compatible and complete spatial memory safety for C . In Proceedings of the 30th Conference on Programming Language Design and Implementation (PLDI), 2009.
  • [32] Oleksii Oleksenko, Bohdan Trach, Robert Krahn, Andre Martin, Mark Silberstein, and Christof Fetzer. Varys: Protecting SGX enclaves from practical side-channel attacks. In USENIX Annual Technical Conference ( USENIX ATC ), 2018.
  • [33] Oleksii Oleksenko, Bohdan Trach, Tobias Reiher, Mark Silberstein, and Christof Fetzer. You Shall Not Bypass : Employing data dependencies to prevent bounds check bypass. arXiv preprint arXiv:1805.08506, 2018.
  • [34] The Chromium Projects. Site Isolation . http://www.chromium.org/Home/chromium-security/site-isolation, 2018. Accessed: May, 2019.
  • [35] Michael Schwarz, Martin Schwarzl, Moritz Lipp, and Daniel Gruss. Netspectre: Read arbitrary memory over network. 2018.
  • [36] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. AddressSanitizer: a fast address sanity checker. In Proceedings of the 2012 Usenix ATC’12, 2012.
  • [37] Jicheng Shi, Xiang Song, Haibo Chen, and Binyu Zang. Limiting cache-based side-channel in multi-tenant cloud using dynamic page coloring. In International Conference on Dependable Systems and Networks Workshops (DSN-W), 2011.
  • [38] Mark Silberstein, Oleksii Oleksenko, and Christof Fetzer. Speculating about speculation: on the (lack of) security guarantees of spectre-v1 mitigations. https://www.sigarch.org/speculating-about-speculation-on-the-lack-of-security-guarantees-of-spectre-v1-mitigations/, 2018. Accessed: May, 2019.
  • [39] Read Sprabery, Konstantin Evchenko, Abhilash Raj, Rakesh B. Bobba, Sibin Mohan, and Roy Campbell. Scheduling, Isolation, and Cache Allocation: A Side-channel Defense . In IEEE International Conference on Cloud Engineering, 2018.
  • [40] E. Tromer, D.A. Osvik, and A. Shamir. Efficient cache attacks on AES , and countermeasures. Journal of Cryptology, 2010.
  • [41] Venkatanathan Varadarajan, Thomas Ristenpart, and Michael Swift. Scheduler-based Defenses against Cross-VM Side-channels . In USENIX Security Symposium , 2014.
  • [42] Luke Wagner. Mozilla Security Blog: Mitigations landing for new class of timing attack | . https://blog.mozilla.org/security/2018/01/03/mitigations-landing-new-class-timing-attack/, 2018. Accessed: May, 2018.
  • [43] Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, and Abhik Roychoudhury. oo7: Low-overhead Defense against Spectre Attacks . 2018.
  • [44] Y. Yarom and K. Falkner. Flush+Reload : A high resolution, low noise, L3 cache side-channel attack. In USENIX Security Symposium, 2014.
  • [45] Andreas Zeller, Rahul Gopinath, Marcel B  ö hme, Gordon Fraser, and Christian Holler. Generating software tests. In Generating Software Tests. Saarland University, 2019. Accessed: May, 2019.

Appendix A Example of an Aggregated Trace

// Format: { Offending instruction :
                 [Mispredicted branches],
                 [Accessed addresses] }
{ Run1: {
    0x440006 :
        [0x531f14, 0x532b08],
        [107820859003904,107820859004032,10782085900...],
    0x470006 :
        [0x531f14, 0x532074],
        [107614700568644,107614700571668,10782085900...],
    0x532836:
        [0x53251f, 0x5326b5, 0x532ba9],
        [5839584],
    0x5424c1:
        [0x5423a7, 0x542408],
        [105690555219985,105690555219986,10569055521...],
    0x542864:
        [0x542683, 0x542743, 0x5427a9],
        [105690555219985,105690555219986,10569055521...],
    0x550007:
        [0x531f14, 0x532074],
        [107614700571612,107820859003940,10788957847...],
    ...},
Run2: {
    0x440006 :
        [0x531f14, 0x532074, 0x532b08],
        [107820859003904,107820859004032,10782085900...]
    0x532969:
        [0x5326b5, 0x532774, 0x5327b0],
        [107889578574388],
    0x5329c1:
        [0x5322b3, 0x5326b5, 0x532774],
        [5839584],
    0x5424c1:
        [0x5423a7, 0x542408],
        [105690555219986,105690555219987...],
    0x542864:
        [0x542683, 0x542743, 0x5427a9],
        [105690555219985,105690555219987,105690555219988],
    0x550007:
        [0x531f14, 0x532074],
        [107614700571612,107820859003940,107889578475616],
    ...},
...
}

4 SpecFuzz: Exposure of Bounds Check
Bypass

To showcase our approach on a specific vulnerability class, we develop a tool for simulating and detecting Bounds Check Bypass (BCB) [24]. We call the tool SpecFuzz.

BCB in its core contains a speculative out-of-bounds access caused by a misprediction of a conditional jump target (see 2.3). To detect the access, we have to implement two components: a simulation of the misprediction (3) and a mechanism for detecting invalid memory accesses during the speculation. The latter is relatively straightforward as we can use one of many existing memory safety techniques; in SpecFuzz, we used AddressSanitizer [36]. To implement the former, though, we need a custom technique.

To simulate conditional branch mispredictions, we create a modified (instrumented) version of the application which executes not only the normal control flow but also the paths that could be taken as a result of mispredictions. Consider the example on Figure 2. Before the conditional branch (line 4), we insert a call to a checkpointing function (line 1) that stores the current process state and initializes simulation. Then, we simulate a misprediction by inserting a branch statement with an inverted condition (line 2) and a jump into the body of the conditional block, thus skipping the original branch (line 3). We proceed with the execution until reaching a terminating condition: either the maximum speculation depth (line 8) or a serializing instruction (not present in the example). After that, we restore the process state to the previous checkpoint (line 9) and redirect the execution to the original branch statement.

We implement this design as a combination of an LLVM [26] backend pass for the x86 architecture and a runtime library.

Figure 4: The workflow of testing an application with SpecFuzz.

4.1 Simulating Branch Misprediction

SpecFuzz simulates mispredictions by forcing the application into taking a wrong branch at every conditional jump. We implement this behavior by replacing all conditional terminators in the program with the ones that have an inverted condition (see Figure 3). Now, when the original basic block (BB) would proceed into the successor , the modified terminator diverges the control flow into . The original terminator is moved into a separate BB, and the control flow returns to normal execution by rolling back into this BB after the simulation.

As a result, every time the program reaches this BB, it first executes the simulated path, then rolls back to the BB and continues with normal execution. We apply this instrumentation to all conditional branches.

4.2 Saving and Restoring Process State

The main requirement to the rollback mechanism used in SpecFuzz was to have low performance impact so that the testing time is kept short. To this end, we implemented a light-weight in-application mechanism that snapshots the CPU state before the simulation and records the memory changes during the simulation.

To store the CPU state, we add a call to the checkpointing function before every conditional jump. The function takes a snapshot of the register values (including GPRs, flags, SIMD, floating-point registers, etc.) and stores it into memory. During the rollback, we restore the register values based on the snapshot. The function also stores the address of the original conditional jump (i.e., original terminator) that we later use as a rollback address.

We could apply a similar mechanism to save the memory state, but this would have an unacceptable performance cost, especially considering that we would have to dump memory contents at every conditional jump. Instead, we log all writes to memory during the simulation. Before every instruction that modifies memory (e.g., mov, push, call), we store the address it modifies and its previous value. Then, to do a rollback, we go through the changes in the reverse order and restore the old values.

4.3 Terminating Simulation

As discussed in 3, we terminate the simulation either if we encounter a serializing event or when the maximum depth of speculation is reached.

To implement the first case, we simply invoke the rollback function before every serializing instruction. As serializing, we consider the following instructions:

  • The instructions listed as serializing in the Intel documentation [13], such as LFENCE or CPUID.

  • System calls. We assume that executing any system call takes longer than the maximum possible duration of speculative execution.

  • External function calls. By the virtue of being implemented as a compiler pass, SpecFuzz cannot correctly run the simulation beyond the instrumented code. Therefore, we have to consider all calls to external functions as serialization points, even though it is not necessarily a correct behavior. In 7, we discuss a potential solution to this problem.

For the second case, we count instructions at runtime. For this, we keep a global instruction counter and set it to zero when a simulation begins. At the beginning of every basic block, we add its length to the counter. (We know the length at compile time because SpecFuzz is a backend pass). When the counter value reaches 250 (maximum possible speculation depth, see 3), we invoke the rollback function.

4.4 Handling errors

Finally, with the simulation mechanism at hand, we have to correctly respond to detections of out-of-bounds accesses or to other error conditions that appear during the simulation. In contrast to normal, nonspeculative execution, the process does not crash if an error happens during the speculation. Instead, the CPU silences the error by discarding its effects when the misprediction is detected.

To simulate this behavior in SpecFuzz, we had to adapt the error response mechanism in AddressSanitizer (we rely on it for detecting out-of-bounds accesses). Normally, upon detecting a bounds violation, AddressSanitizer terminates the application and reports the error. We modified it to instead record the violation in a log and rollback to the previous checkpoint222A more correct response would be to log and continue simulation, although current implementation of SpecFuzz does not yet support it.. Accordingly, one test run might detect several errors. In practice, we observed up to hundreds of errors per single invocation.

Similarly, we have to recover from runtime faults. We register a custom signal handler that logs and rolls back after the signals that could be caused by an out-of-bounds access, such as SIGSEGV and SIGBUS. We also rollback after other faults (e.g., division by zero), but we do not record them in the log as they are irrelevant to the BCB vulnerability.

5 Fuzzing with SpecFuzz

Given the simulation technique described in the previous section (4), we can test applications with conventional dynamic testing methods, such as fuzzing. In our experiments, we used the workflow in Figure 4.

First, we compile the application under test with Clang and apply the SpecFuzz pass, thus producing an instrumented binary that simulates branch mispredictions. Second, we fuzz the binary. In our experiments, we used HonggFuzz [9], an evolutionary coverage-driven fuzzer, and we relied on Intel Processor Trace [22] for measuring code coverage. After fuzzing, we aggregate the traces and analyze the detected vulnerabilities.

5.1 Aggregation of results

As a result fuzzing, we get a trace of detected speculative out-of-bounds accesses. Each entry in the trace has a form of a tuple:

  (Accessed address; Offending instruction; Mispredicted branch)

Usually, the trace is long and may contain up to hundreds of detections per every test run. This happens because the simulation forces the application into doing wrong actions, which frequently leads to errors.

To make the trace usable, we aggregate the results per run and per instruction. That is, for every test run, we collect all the addresses that every unique offending instruction accessed as well as the addresses of branches, mispredictions of which triggered the execution of this instruction. Appendix A is an example of what we get from the aggregation.

5.2 Vulnerability analysis

After the aggregation, we have a list of vulnerabilities with an approximate range of addresses that each of them can access. As we will see in 6.2, the list may be rather verbose and contain up to multiple thousands of vulnerabilities. Yet, we argue that most of them are not realistically exploitable.

In many cases, the attacker does not have any control over the accessed address. This could happen, for example, when the application tries to speculatively dereference a field of an uninitialized structure. In this case, no matter what input we provide to the application, the speculative dereference will always go to the same address. This lack of control shifts the vulnerability into the category of conventional side-channel attacks [40, 44] that are less critical than BCB because they provide less information to the attacker and are harder to launch. Moreover, the defence strategies for these attacks are also different from BCB.

We identify these cases by analyzing the aggregated traces. We estimate the presence of the attacker’s control by comparing the accessed addresses in every run. If a given offending instruction always accessed the same set of addresses, we assume that the attacker does not have control over it. Note, however, that the heuristic is valid only after a large enough number of test runs.

6 Evaluation

In this section, we try to answer the following questions:

  • How effective is SpecFuzz at detecting BCB vulnerabilities?

  • Is it better at finding the vulnerabilities than the existing static analysis tools?

  • Does patching based on SpecFuzz results give us a performance improvement compared to full-application protection?

To put the results into a context, we compare SpecFuzz to two existing open-source projects: Spectre V1 Scanner (RH Scanner) [15]—a static analysis tool from RedHat, and Speculative Load Hardening (SLH) [2]—an LLVM pass that masks all speculative loads thus providing a conservative defence against BCB.

Instead of RH Scanner, we could have compared SpecFuzz with more advanced static analysis tools Respectre [21] and oo7 [43], but they are not freely available: Respectre is a commercial product and oo7 is provided only upon request. We did not manage to get access to oo7 and the comparison with Respectre is still being arranged with the authors.

Applications. For the evaluation, we tested six commonly used libraries. The libraries include two cryptographic functions (AES from libTomCrypt [10] and RSA from BearSSL [4]), a compression algorithm (Brotli [5]), and three parsing libraries, JSON (JSMN [6]), HTTP [7], and YAML [8]. We picked these specific libraries because they may directly process unsanitized user input from the network, giving the attacker better chances of controlling memory accesses within the libraries.

Testbed. We ran all the experiments on a 4-core (8 hyper-threads) Intel Core i7 CPU operating at 3.4 GHz (Skylake microarchitecture) with 32 KB L1 and 256 KB L2 private caches, an 8 MB L3 shared cache, and 32 GB of RAM. The machine was running Linux kernel 4.16.

6.1 Detection of BCB Gadgets

MSVC RH Scanner SLH SpecFuzz Total
2 12 15 15 15
Table 1: The number of basic BCB variants detected by different mitigation tools.

With the first experiment, we want to show that SpecFuzz is effective at detecting different variations of BCB. To this end, we tested 15 sample BCB variants created by Paul Kocher [23] which represent 15 different ways BCB may occur in C code. The variants were originally designed to illustrate the shortcomings of the BCB mitigation mechanism in MSVC [30], but they can serve as a good starting point for evaluating effectiveness of any BCB detection tool. Note, however, that the suite is not exhaustive and it does not represent all possible variants of BCB; rather, it evaluates the basic detection capabilities of a tool.

The testing results are presented in Table 1. As expected, the simulation in SpecFuzz works correctly and surfaces all speculative out-of-bounds accesses, which are then detected by AddressSanitizer. As of the other tools, the original article [23] reported that the MSVC pass detects only 2 variants333The result may be outdated. We did not test the newer versions of the pass and it may have improved since the publication of the article.

. It happens because it relies on simple pattern matching, that is, it searches for specific patterns in the binary. Accordingly, if the vulnerability happens to take a form not envisioned by the developers, the compiler will not protect it. The same goes for RH Scanner, although it relies on more generic patterns and thus, detects more variants. SLH does not attempt to find the vulnerabilities and instead protects all conditional branches in the application, which guarantees complete coverage.

6.2 Fuzzing results

AES RSA Brotli JSMN HTTP YAML Total
SpecFuzz Total 3 16 1909 17 15 95 2055
Controlled 0 2 980 2 3 25 1012
In C 0 2 52 2 3 19 78
Verified 0 0 6 (3) 2 2 0 10

 

RH Scanner Total 0 13 (3 new) 5 1 1 18 38
True positive 0 8 4 1 1 4 18
Controlled 0 0 0 0 0 0 0

 

SLH: instrumented branches 21 175 253 83 452 3398 4382
Table 2: Vulnerabilities found by SpecFuzz, RH Scanner, and the total number of branches instrumented by SLH.

Next, we wanted to see how effective SpecFuzz is at detecting vulnerabilities in the wild. To this end, we fuzzed six real-world libraries. The fuzzing of each application lasted for 24 hours. Before running each experiment, we fuzzed a native (i.e., not instrumented) version of the application to create an initial input corpus. This way, we ensured large coverage (see 7 for discussion of the coverage issue).

The results are presented in Table 2. The first row (Total) represents the total number of vulnerabilities detected by SpecFuzz. There is a vast difference between the results, ranging from almost 2000 vulnerabilities found in Brotli to only 3 found in the AES code. One reason for it is the code type: AES and RSA are cryptographic functions that are written with side-channel attacks in mind. They strive to avoid branching on the input, which reduces the opportunities for BCB. Another factor is the sheer code size: Brotli has ~9000 LOC while JSMN has less than 400 LOC.

For most of the vulnerabilities, however, we did not observe any correlation between the input and the accessed address, which puts them into the low-risk category (see 5.2). The second row (Controlled) is what is left after filtering them out. The number becomes even lower if we convert them into locations in the C code (the third row, In C). As SpecFuzz works with binaries, a single vulnerability in C may be reported several times due to compiler optimizations, such as loop unrolling.

Finally, we manually checked the results. The fourth row (Verified) are realistic vulnerabilities that have to be patched. The difference between the rows 3 and 4 is caused by the fact that AddressSanitizer does not provide precise information about the overflows and SpecFuzz sometimes falsely marks them as Controlled (see 7). In a few cases (in brackets), we were not able to find out if the attacker has control over the accessed address.

For comparison, we also tested the applications with RH Scanner (rows 5 and 6). It detected much fewer vulnerabilities than SpecFuzz and in none of them the attacker has control over the accessed address.

Interestingly, RH Scanner found 3 low-risk vulnerabilities in RSA that were not triggered by fuzzing. After manual inspection, we found out that SpecFuzz did not detect them because they require nested misprediction to be triggered, which our implementation does not yet support.

6.3 Performance impact of the patches

In this experiment, we wanted to see whether patching based on the fuzzing results gives us better performance compared to full-application instrumentation, such as the one implemented by SLH. To evaluate it, we manually patched the vulnerabilities found in the previous experiment. We patched only the controlled vulnerabilities because we assume that a vulnerability is not realistically exploitable if the attacker does not have any control over the accessed address (see 5.2). To prevent the vulnerabilities, we added an LFENCE instruction after every risky branch. Then, we measured the runtimes of the patched applications and compared them to the overheads of SLH protection. Every measurement was repeated 100 times and then averaged.

The results are presented in Figure 5. As we can see, the overhead is significantly lower compared to the one introduces by SLH. In three cases—AES, RSA, and the YAML parser—no patch was necessary as we did not find any controllable vulnerabilities. In the other cases, the patches introduced only up to 12% overhead.

Such a drastic difference stems from the fact that we patch only a small subset of the branches in the program. If we look at the total numbers (Table 2), SLH instrumented 4382 branches across all the libraries, while SpecFuzz allowed us to reduces the number to only 10 realistic vulnerabilities.

One interesting outlier is JSMN which experienced ~500% slowdown with SLH. The outlier is caused by an extremely high density of branches in the application (approximately one branch executed every cycle) and, thus, high reliance on branch prediction to efficiently utilize instruction parallelism. SLH effectively disables this optimization and makes the execution much more sequential. At the same time, SpecFuzz found only two high-risk vulnerabilities in JSMN and patching them introduced only ~1% slowdown.

Note, however, that the results of this experiment mainly have an illustrative purpose. Because SpecFuzz and SLH represent different classes of defence tools, they have different security guarantees and their cost cannot be compared directly. While SLH is a conservative technique that guarantees complete absence of unsafe speculative loads, SpecFuzz is a testing tool that may miss certain vulnerabilities (see 7). Accordingly, they should be applied in different situations: SLH—when security is critical, and SpecFuzz—when performance is the first priority.

Figure 5: Performance overheads of the patches based on SpecFuzz compared to the slowdown introduced by SLH.

6.4 Case Study: libHTP

To showcase the fuzzing process on a specific example, this section will present the procedure of fuzzing an HTTP parsing library libHTP [7] and the vulnerabilities that we found in it. We will also examine one realistic and potentially exploitable vulnerability that SpecFuzz discovered in the library.

We picked libHTP as a case study because it is a security-critical library that could be used in a wide spectrum of web applications and a flaw in libHTP can undermine the security of any project that relies on it. Moreover, it is a security-aware parser, regularly tested and fuzzed for traditional bugs (e.g., memory safety violations), which makes it unlikely to be exploitable through conventional methods.

In this experiment, we built the library with our modified compiler and ran the fuzzers available in the project’s repository. As previously, we used HonggFuzz [9] and ran the experiment for 24 hours. Thanks to the sample inputs shipped with the project, the coverage was large from the very beginning.

Fuzzing results. In total, SpecFuzz detected 3322 unique speculative out-of-bounds accesses. Most of them are low-risk bounds violations in which the attacker cannot control the accessed address. One such example is presented in Figure LABEL:fig:htp-low-risk. Here, the CPU might mispredict the branch on line 5 and execute the dereference on line 1 (i.e., ((*(X)).realptr) even though the pointer b is NULL. The invalid address will be always to the same address, which is why we consider this vulnerability benign.

1#define bstr_ptr(X) ( ((*(X)).realptr == NULL) \
2       ? ((unsigned char *)(X) + sizeof(bstr)) \
3        : (unsigned char *)(*(X)).realptr )
4...
5if (b == NULL) return NULL;
6return bstr_util_memdup_to_c(bstr_ptr(b), bstr_len(b));\end{lstlisting}
7    \caption{Example of a low-risk speculative out-of-bounds access in libHTP.}
8    \label{fig:htp-low-risk}
9\end{figure}
10
11Using the analysis mechanism presented in \subsecref{analysis}, we automatically filtered these cases out.
12It reduced the number to 148 locations in binary or to 99 in C\@.
13The number of code locations is smaller because one line in C can be compiled into several locations in binary (e.g., due to loop unrolling) and a single vulnerability will appear in several places.
14
15Yet, even 99 is not the final number.
16As described in \secref{discussion}, the automatic analysis sometimes mistakenly marks a vulnerability as controllable.
17For example, in \figref{htp-analysis-mistake}, \sys{} detects an out-of-bounds access on line 3 (\code{data[i]}).
18A misprediction of the \code{while} condition (line 2) may cause a speculative execution of a few more loop iterations than necessary, leading to accesses to the addresses beyond the array’s end.
19The size of the array may differ from one input to another and the overflow addresses will also differ.
20This leads the analysis tool into falsely marking the vulnerability as controllable.
21
22\begin{figure}[t]
23    \begin{lstlisting}[frame=tb]
24size_t i = 0;
25while (i < len) {
26    data[i] = tolower(data[i]);
27    i++;
28}\end{lstlisting}
29    \caption{Example of a vulnerability mistakenly labeled as controllable.}
30    \label{fig:htp-analysis-mistake}
31\end{figure}
32
33\myparagraph{Found vulnerabilities}
34After we manual inspected the code (approximately 2 hours worth of work) and removed these false labelings, we were left out with only 3 controllable vulnerabilities\footnotemark{}.
35One of them (see \figref{htp-base64}) is realistically exploitable if the attacker can accurately monitor the cache state.
36Here, the function \code{base64\_decode\_single} decodes a Base64 encoded symbol by looking it up in a table of precomputed values (array \code{decoding}, lines 2--3).
37Before fetching the decoded symbol, the function checks the value for over- and underflows.
38The attacker can bypass the check by training the branch predictor and, thus, triggering a speculative overread on line 7.
39
40\footnotetext{The vulnerabilities were submitted to the library developers and are currently under review.}
41
42There are two properties of the code snippet that make the vulnerability realistically exploitable.
43First, the attacker has complete control over the accessed address because the array index (\code{value\_in}) is a part of the HTTP request, that is, a part of the program input.
44Second, the fetched value is further used for defining the control flow of the program (see the comparison on line 16), which allows the attacker to infer a part of the value (specifically, its sign) by observing the cache state.
45
46The attacker could execute the attack as follows.
47She begins by sending a probing message to find out which cache line the first element of the array \code{decoding} uses.
48Then, she sends a valid message to train the branch predictor on predicting the bounds check (line 5) as true.
49Finally, she resets the cache state (e.g., flushes the cache) and sends a message that contains a symbol that triggers an overread, followed by a symbol that triggers a read from the first array element.
50If the read value is negative, the loop will do one more iteration, execute the second read, and the attacker will see a change in the state of the corresponding cache line.
51Otherwise, the loop will be terminated and the state will not change.
52
53In summary, this vulnerability is a speculative buffer overread, which allows reading data within the range of 175 bytes beyond the bounds of the array \code{decoding} and leaks the sign of the byte it reads.
54
55The other two vulnerabilities are less realistic as they require the application to misuse the library’s interface.
56Still, it is better to protect them, especially considering the low runtime cost of the patch (see next).
57
58\myparagraph{Patch}
59To patch the vulnerabilities, we added an \code{LFENCE} instruction before every potentially unsafe load.
60According to our measurements, the patch changed the request processing time by 6.5\%, from 920 milliseconds to 980 milliseconds (averaged over 10M requests).
61For comparison, applying SLH changed the time by 9.7\%, to 1010 milliseconds.
62
63
64\begin{figure}[t]
65    \begin{lstlisting}[frame=tb]
66int base64_decode_single(signed char value_in) {
67  static signed char decoding[] =
68    {62, -1, ...}; // 80 elements
69  value_in -= 43;
70  if ((value_in < 0) || (value_in > decoding_size - 1))
71    return -1;
72  return decoding[(int) value_in];
73}
74...
75int htp_base64_decode(const void *code_in, ...) {
76  signed char fragment;
77  ...
78  do {
79    ...
80    fragment = htp_base64_decode_single(*code_in++);
81  } while (fragment < 0);
82...}\end{lstlisting}
83    \caption{A realistic BCB vulnerability in the Base64 decoding function that was found by \sys{}.}
84    \label{fig:htp-base64}
85\end{figure}

7 Limitations

In this section, we discuss the limitations of SpecFuzz that we did not envision (or underestimated) while designing the tool. All of them, however, are only implementation flaws and can be fixed in the future without redesigning the system.

Nested misprediction. One important feature that our implementation does not yet support is simulation of nested mispredictions, that is, simulation of the situations when a branch is mispredicted while speculative execution is already running. As the evaluation has shown (6.1), some vulnerabilities indeed require misprediction of multiple branches in a row to get triggered, although this appears infrequently.

To implement the nested simulation, we would have to maintain a stack of checkpoints instead of a single checkpoint: Every time we start a new simulation, we push the checkpoint on the stack, and every rollback restores the topmost checkpoint. We plan to implement this feature in the future.

Mislabeling due to incomplete information. While doing the evaluation, we discovered that our vulnerability analysis technique (see 5.2) sometimes gives a false result and mistakenly labels an uncontrolled vulnerability as a controlled one. It is not a big problem because the cost of patching is generally low, but it may introduce unnecessary overhead.

The reason for the mislabeling is as follows. At a detected bounds violation, AddressSanitizer reports only the accessed address and not the distance between the address and the referent object bounds (e.g., buffer overflow size). Therefore, if the object size differs among the test runs, the accessed address will also be different, even if the distance is the same. And because now the same instruction accesses different addresses for different inputs, the analysis falsely labels it as a controllable vulnerability.

For example, one common case of mislabeling that we encountered is off-by-one accesses. If an array is read in a loop, our simulation will force the loop to take a few additional iterations and read a few elements beyond the array’s bound. Here, the attacker has no control over the accessed addresses, but if the array size differs from one test run to another, the addresses accessed in these extra iterations will also be different. The analysis would mark this vulnerability as controllable.

So far, we handle these cases by manually analyzing the code. A better solution would be to use a more complete memory safety technique (e.g., Intel MPX [13]) that maintains metadata about referent objects. That would allow us to filter by changes in the distance to the object bound instead of changes in the accessed address. Unfortunately, none of such techniques is supported by Clang out-of-the-box. To resolve this issue, we would have to implement the support or migrate SpecFuzz to another compiler.

Mislabeling due to masking. Mislabeling may also happen if two or more transient out-of-bounds accesses happen in a row. In the current implementation, SpecFuzz rolls back immediately upon detection of an out-of-bounds access or after receiving a fault. Real hardware, however, does not behave this way and instead proceeds with the speculative execution.

This discrepancy between the simulation and the real behavior may lead to a wrong classification of a vulnerability if an uncontrolled access is followed by a controlled one. For example, if a null dereference precedes a controlled buffer overflow, the simulation will always rollback at the dereference and the overflow will never be triggered. Accordingly, the traces will show that misprediction of this branch always causes an access to the same address (i.e., to 0), and the vulnerability will be mistakenly labeled as uncontrolled.

This issue can be fixed by continuing the simulation (instead of rolling back) after recording the error. This, however, would often cause recursive faults, and our current implementation cannot yet handle them properly.

Fuzzing coverage. In our evaluation, we used IPT [22] for measuring coverage. The simulation of mispredictions, however, makes the measurement artificially inflated because it adds the speculative paths that do not belong to normal program execution. Currently, to compensate this effect, we run a preliminary fuzzing round with noninstrumented binary that creates an initial extensive input corpus. In the future, we could implement a custom coverage mechanism that would ignore the simulation.

Binary instrumentation. Instead of creating a compiler pass, we could have implemented SpecFuzz as a binary instrumentation (e.g., with PIN [28]). This would allow us to patch binaries directly, without having to traverse the vulnerability back to the source code, thus making the patching process simpler and, in some cases, the patches would introduce less overhead. Moreover, by using this approach we would be able to correctly support dynamic linking and would not require access to the source code. However, the binary instrumentation tools are normally heavy-weight and it would considerably increase the time required for testing.

8 Related Work

Finally, let us take a look at the existing alternatives to SpecFuzz. Essentially, BCB is a combination of three vulnerabilities: a conditional branch misprediction, a memory safety violation, and a side-channel leakage. Accordingly, there are three types of defences that eliminate at least one of the vulnerabilities.

8.1 Preventing unsafe speculation

The most radical solution is to disable branch prediction entirely [3], although not all processors support it. Alternatively, the speculation can be disabled on a per-branch basis by adding serialising instructions, such as LFENCE on Intel CPUs or DSBSY on ARM. This approach, however, makes the effectiveness of the CPU utilization much lower and causes a large slowdown, over 400% in certain cases [33].

A more effective conservative defence technique is to add a data dependency between the conditional branch and the potentially invalid memory access. This way, most of the instructions can benefit from branch prediction and only the memory accesses are delayed. This approach is used, for example, in Speculative Load Hardening [2]. Although it provides a performance improvement over full serialization, the performance overhead is still considerable.

To avoid the high performance cost, we could patch only those vulnerabilities that seem to be exploitable. This is the idea behind the static analysis tools like Spectre 1 Scanner from RedHat [15] and MSCV Spectre 1 pass [30]. They analyse the binary and search for the instruction sequences that resemble a BCB vulnerability. Usually, they are searching for a variation of the pattern branch->load->load/write. However, using this pattern in a generic form would lead to marking all the branches as vulnerable, which is counter to the goal of the analysis. Therefore, most of the tools rely on specific variants of the pattern. It makes them inherently incomplete and restricted to the variants envisioned by the tool developers.

A more advanced analysis technique is used in oo7 [43]. It relies on static taint analysis to detect the memory accesses that are dependent on the program input. (This is the same criteria that we used to identify low-risk vulnerabilities.) Specifically, oo7 searches for the following pattern: A conditional branch with a condition dependent on the input (i.e., tainted) is followed by a load dependent on the condition, followed by memory access dependent on the load. Even though this approach is more reliable and generic than the simple pattern-matching techniques, it is affected by the inherent problems of static taint analysis. Namely, limited analysis depth may cause false positives and overtainting causes false negatives.

Respectre [21] is another analysis tool that claims to have better coverage than the existing alternatives. However, it is a commercial product and we can neither verify the claims nor compare it to SpecFuzz.

8.2 Preventing memory safety violations

Classical memory safety techniques (e.g., Intel MPX [13], SoftBound [31]) do not protect from BCB as the bounds checks they add can be mispredicted. Yet, they can be retrofitted to disable unsafe accesses even in the speculated paths.

A variant of this approach—pointer clipping—is now used in JavaScript engines [42] where, before accessing an array element, the index is masked with the array size. Because masking is an arithmetic operation, it does not create a control hazard and is not predicted by the CPU. However, this defence is vulnerable to the attacks where the data type is mispredicted and a wrong mask is used [20].

8.3 Preventing side channels

Another approach is to allow the transient accesses, but eliminate the possibility of leaking their results through a side channel.

In practice, web browsers achieve it by reducing the resolution of timers [42], disabling shared memory or by using site isolation [34]. However, these techniques prevent only cross-site attacks, and do not work at the presence of a local attacker.

The most common target for side-channel attacks is CPU caches, and many of the practical attacks can be prevented by defending this target. There is an extensive body of research in the direction of preventing side channel attacks, ranging from cache isolation [39], to attack detection [19], enforcing non-interrupted execution [41, 32], and cache coloring [37]. Yet, they provide only a partial defence as transient execution attacks may use other side-channel targets too [35].

Finally, an isolated execution environment can be achieved with a specialized microkernel [16], but it requires a complete redesign of the system.

9 Conclusion

We presented a technique to make speculative execution vulnerabilities visible by simulating them in software. We demonstrated the technique by implementing a Bounds Check Bypass detection tool called SpecFuzz. During the evaluation, the tool has proven to be more effective at finding vulnerabilities than the available static analysis tools (in total, 2055 found by SpecFuzz, whereas RH Scanner found 38).

At the same time, the additional dynamic information we get from SpecFuzz has shown us that realistic BCB vulnerabilities are much less common than we initially anticipated. Among the tested libraries, SpecFuzz found only 10 realistically exploitable BCB vulnerabilities. To make use of this observation, we developed a simple analysis heuristic that allowed us to automatically filter out the low-risk vulnerabilities. Thanks to it, the performance cost of SpecFuzz-based patches was considerably lower (at most 12%) compared to the slowdown caused by conservative defences such as Speculative Load Hardening (80% on average).

References

  • [1] Checkpoint/Restore In Userspace. . http://criu.org/. Accessed: May, 2019.
  • [2] Speculative Load Hardening: A Spectre Variant 1 Mitigation Technique . https://docs.google.com/document/d/1wwcfv3UV9ZnZVcGiGuoITT_61e_Ko3TmoCS3uXLcJR0/edit#heading=h.phdehs44eom6, 2018. Accessed: May, 2019.
  • [3] SUSE Security update for kernel-firmware. https://www.suse.com/de-de/support/update/announcement/2018/suse-su-20180008-1/, 2018. Accessed: May, 2019.
  • [4] BearSSL . https://bearssl.org/, 2019. Accessed: May, 2019.
  • [5] Brotli . https://brotli.org/, 2019. Accessed: May, 2019.
  • [6] JSMN . https://github.com/zserge/jsmn, 2019. Accessed: May, 2019.
  • [7] LibHTP . https://github.com/OISF/libhtp, 2019. Accessed: May, 2019.
  • [8] libyaml . https://pyyaml.org/wiki/LibYAML, 2019. Accessed: May, 2019.
  • [9] Honggfuzz. http://honggfuzz.com/, 2019. Accessed: May, 2019.
  • [10] libtomcrypt. https://www.libtom.net/, 2019. Accessed: May, 2019.
  • [11] Intel Corp . Analysis of Speculative Execution Side Channels . White Paper, 2018.
  • [12] Intel Corp . Speculative execution side channel mitigations. White Paper, 2018.
  • [13] Intel Corporation . Intel ® 64 and IA-32 Architectures Software Developer’s Manual . 2018.
  • [14] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F Wenisch, Yuval Yarom, and Raoul Strackx. FORESHADOW: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution . In Usenix Security, 2018.
  • [15] Nick Clifton. SPECTRE Variant 1 scanning tool . https://access.redhat.com/blogs/766093/posts/3510331, 2018. Accessed: May, 2019.
  • [16] Qian Ge, Yuval Yarom, Tom Chothia, and Gernot Heiser. Time protection: the missing os abstraction. In Proceedings of the EuroSys Conference, 2019.
  • [17] Google. More details about mitigations for the cpu speculative execution issue. https://security.googleblog.com/2018/01/more-details-about-mitigations-for-cpu_4.html, 2018. Accessed: May, 2019.
  • [18] Project Zero Google. Speculative Execution, Variant 4: Speculative Store Bypass . https://bugs.chromium.org/p/project-zero/issues/detail?id=1528, 2018. Accessed: May, 2019.
  • [19] Daniel Gruss, Julian Lettner, Felix Schuster, Olga Ohrimenko, Istvan Haller, and Manuel Costa. Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory . In Usenix Security, 2017.
  • [20] Noam Hadad and Jonathan Afek. Overcoming (some) Spectre browser mitigations . https://alephsecurity.com/2018/06/26/spectre-browser-query-cache/, 2018. Accessed: May, 2019.
  • [21] Open Source Security Inc. Respectre: The State of the Art in Spectre Defenses . https://www.grsecurity.net/respectre_announce.php, 2018. Accessed: May, 2019.
  • [22] Reinders James. Intel Process Trace . https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing, 2013. Accessed: May, 2019.
  • [23] Paul Kocher. Spectre Mitigations in Microsoft’s C/C++ Compiler . https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html, 2018. Accessed: May, 2019.
  • [24] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre Attacks: Exploiting Speculative Execution . arXiv preprint arXiv:1801.01203v1, 2018.
  • [25] Esmaeil Mohammadian Koruyeh, Khaled Khasawneh, Chengyu Song, and Nael Abu-Ghazaleh. Spectre Returns! Speculation Attacks using the Return Stack Buffer . 2018.
  • [26] Chris Lattner and Vikram Adve. LLVM: a compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2004.
  • [27] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown . arXiv preprint arXiv:1801.01207, 2018.
  • [28] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. PIN : building customized program analysis tools with dynamic instrumentation. In ACM Sigplan Notices , 2005.
  • [29] Giorgi Maisuradze and Christian Rossow. ret2spec: Speculative Execution Using Return Stack Buffers . In CCS, 2018.
  • [30] Microsoft. Msvc compiler reference: /qspectre. https://docs.microsoft.com/en-us/cpp/build/reference/qspectre?view=vs-2019, 2018. Accessed: May, 2019.
  • [31] Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. SoftBound : Highly compatible and complete spatial memory safety for C . In Proceedings of the 30th Conference on Programming Language Design and Implementation (PLDI), 2009.
  • [32] Oleksii Oleksenko, Bohdan Trach, Robert Krahn, Andre Martin, Mark Silberstein, and Christof Fetzer. Varys: Protecting SGX enclaves from practical side-channel attacks.