In the last years, we have seen a lot of research progress in the field of fuzzing—both from academia (weizz, ; angora, ; qsym, ; tfuzz, ; drillerfuzzer, ; pangolin, ; ijon, ; nyx, ; stochfuzz, ; winnie, ) as well as industry (afl, ; aflpp, ; libfuzzer, ; lafintel, ). The majority of improvements made in the last years have focused on improving fuzzing algorithms themselves. However, it is slowly becoming apparent that improvements for fuzzing on the algorithmic level have less impact in practice compared to improvements to the ability to fuzz new targets. Any application that is fuzzed for the first time is likely to result in many security-relevant findings (exponentialcost, ). Hence, we are currently observing a shift towards making it feasible (and even easy) to target new applications or systems.
The two most common approaches to achieve this goal are generating function-style harnesses for persistent mode fuzzers such as libFuzzer (libfuzzer, ; fuzzgen, ; fudge, ; winnie, ) or fuzzing based on snapshots taken at the start of the test case (nyx, ; agamotto, ; cmilk, ). While these approaches address some of the problems of harnessing targets, none of them allows proper harnessing of systems with many messages that are being passed back and forth (e.g., network services). Such services pose a unique set of challenges distinct from “parse a binary blob” fuzzing targets: the target applications are typically much more stateful, more complex (and hence slower), and the message formats are often more complex than individual file formats.
Such applications make up a significant fraction of interesting attack surface. Yet, little has been done to allow effectively fuzzing and harnessing such targets with coverage-guided fuzzers. The notable exception is AFLnet (aflnet, ), which runs the target and creates new connections for each test input. While this approach has long been used to fuzz network services in blind fuzzers, its drawbacks has made coverage-guided network fuzzing exceedingly difficult. First, using network connections is significantly slower than reading from a file. Second, as the service is running persistently, the fuzzer has no clear point at which the service is ready to receive a test case; such fuzzers require waiting for manually-specified, fixed periods of time during startup, and use similar timeouts when handling each test case. Third, reusing the same process (without fully restarting the server) is also “noisy”: for example, background threads in the service can randomly get scheduled independently of the test cases the fuzzer sent. These seemingly random code paths still affect the fuzzer’s coverage and introduce pointless inputs into the queue. Similarly, great care has to be taken to ensure that the target is not operating based on state introduced by previous test cases. To this end, AFLnet requires a user to write a “cleanup” script that is solely responsible for ensuring that all changes to the file system, databases, etc. are rolled back after each test case. Building such a script based on spurious, non-reproducible inputs can require a significant effort.
In this paper, we present the design and implementation of Nyx-Net, a fuzzing method that is able to test complex, stateful message-passing systems such as network services. Our approach is based on the following design principles: first, we use hypervisor-based snapshot fuzzing (kafl, ; nyx, ; agamotto, ) to ensure noise-free fuzzing and to speed up resetting to a clean slate. Second, we propose selective emulation of network functionality to avoid the heavy cost of real network traffic. This also allows Nyx-Net to control exactly when packets are consumed in order to avoid guessing the right timeouts.
Our prototype implementation of Nyx-Net is built on top of Nyx (nyx, ), which it enhances with a set of new capabilities. Most importantly, we enable Nyx to target network connections and add support for handling the network stack. We also extend Nyx’s snapshot capabilities by introducing incremental, whole-VM snapshots. The snapshot mechanism is agnostic of the OS running in the VM. Using such snapshots increases performance and ensures that all state is properly reset between each test case. Furthermore, using such snapshots allows us to emulate a significant fraction of the network APIs: we run the target until a snapshot is taken without major interference (i.e., close to native speed). When the hooks detect that the target is about to receive the first bytes of fuzzer-supplied input, we take a whole-system snapshot. Following this snapshot, we now emulate the network interactions of the target connection to further increase speed. As our evaluation shows, precisely emulating all network interactions is a difficult task. By only emulating the (few) operations on the target connection, we reduce the need to emulate all I/O functionality faithfully. While we tested our prototype implementation only on two different Linux VM setups (busybox and Ubuntu) as well as targets running inside a Docker container inside of a Ubuntu VM, Nyx-Net’s network emulation layer should be compatible with any POSIX-compliant system.
Note that Nyx-Net is not limited to network interface fuzzing. Instead, our approach can fuzz any complex, stateful message-based target such as Inter-Process Communication (IPC) interfaces. For example, Firefox splits safety-critical parts of its codebase into isolated sandbox processes that communicate via various IPC methods such as Unix domain sockets and shared memory. We show that Nyx-Net can efficiently and effectively test these IPC interfaces. In fact, after seeing that Mozilla was looking for a tool like Nyx-Net, we reached out to them and they decided to integrate Nyx-Net into their testing pipeline after finding multiple security issues using. An excerpt from Mozilla’s blogpost on the challenges of IPC fuzzing (mozilla-ipc, ) is shown in Figure 1.
As our evaluation shows, Nyx-Net drastically improves upon the state-of-the-art: compared to AFLnet on their own benchmark ProFuzzBench (profuzzbench, ), we are able to improve test throughput by up to 300x and coverage found by up to 70%. Compared to Agamotto (agamotto, ), the state-of-the-art in snapshot fuzzing kernel modules, Nyx-Net is able to perform both snapshot reload and creation operations almost 10x faster. Additionally, Nyx-Net is able to find crashes in two of ProFuzzBench’s targets that no other fuzzer could detect previously. In an evaluation with the game Super Mario Bros., we show that Nyx-Net is able to solve most levels about 10-30x faster than the state-of-the-art (AFL + Ijon (ijon, )) — demonstrating it’s ability to improve the performance of message-based targets unrelated to networking (see Figure 2 for a visualization). In fact, when running in parallel, Nyx-Net is able to solve some levels “faster than light”: solving the level takes less wall-clock time than playing the level perfectly even once. Nyx-Net is even able to exploit a glitch to solve a level that the authors of Ijon believed to be unsolvable. Lastly, Nyx-Net is not only able to find previously unknown bugs in servers such as Lighttpd, but also network clients such as MySQL client, and even Firefox’s IPC.
In summary, we make the following key contributions:
We introduce Nyx-Net, an efficient fuzzing method that uses hypervisor-based snapshot fuzzing and selective emulation of network functionality to avoid the heavy cost of handling the full network traffic.
We study the concept of incremental snapshots in fuzzing for complex and slow targets, and show that our approach is able to efficiently test different types of client and server systems.
In our evaluation, we show that Nyx-Net outperforms state-of-the-art fuzzing tools by more than an order of magnitude in many benchmarks. Furthermore, our prototype implementation found multiple unknown bugs in complex, real-world software.
2. Technical Background
We now discuss the technical difficulties of fuzzing real-world network services. We specifically focus on the approach used by existing methods to perform network fuzzing. Given that Nyx-Net is based on Nyx, we also elaborate on the technical aspects of Nyx relevant for this paper.
2.1. Network Service Fuzzing
Many widely used network services and servers are still written in memory unsafe languages for performance reasons. This puts such software at significant risk: Most of the complexity is part of some very public (often Internet-wide) attack surface. To make the matter worse, getting memory safety right in languages that do not enforce memory safety is notoriously difficult. Over the last years, fuzzing has become one of the primary tools for finding complex memory safety bugs in an automated way. Of course, this includes network service fuzzing. For example, AFLnet (aflnet, ) removed AFL’s focus on single files and enabled it to send packets to target sockets. There have also been various efforts to remap socket-based I/O to file-based I/O (libpreeny, ). When successful, this approach allows to use pre-existing file-based fuzzers such as AFL++ (aflpp, ) to fuzz network services.
However, effectively fuzzing network services remains challenging: AFL and most of its derivatives assume that the target is fast and spawns only a single process that runs until the input is consumed. Afterwards, the target is supposed to terminate immediately. Also, two sub-sequential executions should be (mostly) independent of each other. Unfortunately, none of these assumptions hold for most network services. They are often designed with little regard for startup time, persist across connections (often by spawning threads or sub-processes), and maintain a significant state.
AFLnet forces a user to write clean-up scripts that need to reset the environment to avoid contaminating results of later tests. It also employs fixed sleep times to ensure servers are online and in a good state. Overall, this makes it difficult to test new software and drastically reduces the performance. We found that it is not uncommon for AFLnet to only be able to achieve single digit test executions per second.
To make matters worse, network APIs provided by current operating systems (OSs) are notoriously slow. Establishing a connection and reading data from it is far slower than reading from a file. A common workaround is to avoid network interfaces entirely: libpreeny (libpreeny, ) introduced a “de-sock” hook that returns the file descriptor of stdin instead of network sockets when a new connection is established (this idea is now also part of AFL++). This massively improves the performances, but fidelity is low: the vast majority of operations possible on sockets are not supported by stdin. As such, it will simply not work with most real-world software. Note that libpreeny also contains a more advanced stdin-to-socket connection that uses a real socket and a new thread to move data from stdin to the network socket used by the target. This makes fuzzing of more complex network targets possible, but also losses the performance gains coming from network emulation.
2.2. Protocol Fuzzing
Fuzzing network services is further complicated by the fact that they are much more interactive than software that processes static file formats. Network services also often incorporate features such as compression, encryption, sequence numbers, and checksums that greatly hinder fuzzing efforts.
Historically, this problem has been addressed by blind generator-style fuzzers: the user simply writes a program that connects to the target and sends random, but (almost) valid protocol runs. This makes it easy to fix the aforementioned problems, but requires expert knowledge of the specific protocol and significant efforts.
The main success criterion of AFL was that only a superficial or even no understanding of the format being fuzzed is necessary to use the tool to find bugs. Using AFL requires a set of start inputs (so-called seeds). While AFL is often able to work even with empty seeds, it usually is more effective if sensible seeds are provided. This is mostly due to the mutation-based fuzzing of AFL and its coverage feedback. However, to use these advantages, in addition to writing a good generator, the user would also have to write a good mutator—an additional hurdle to jump that reduces usage of fuzzing. Nonetheless, there quite a few fuzzers exist that allow the user to specify detailed formats used for coverage-guided fuzzing. Usually, these fuzzers allow the user to provide a grammar or format specification (aflsmart, ; nautilus, ).
Commonly, the inputs all have to be valid for parsing. As a consequence, AFLsmart (aflsmart, ) (which uses Peach’s (peach, ) pit file format to specify inputs) has to take great care to handle broken or otherwise unparseable inputs. Hence, AFLsmart only parses the seed inputs as it is computationally infeasible to parse new inputs found during fuzzing.
Some fuzzers avoid this problem by simply making it impossible to use seed inputs. For example, Nautilus (nautilus, ) uses context-free grammars, but does not allow to provide seeds. As such, arbitrary grammars including those that are hard to parse can be used. Similarly, Nyx and syzkaller (syzkaller, ) follow the purely generative approach and allow to specify input formats as sequences of typed function calls (or opcodes), but forgo the option to provide seeds.
Lastly, in a far less principled, but just as effective approach, most fuzzers support registering custom mutators. They usually parse inputs on a best effort basis (e.g., by splitting the input at newlines or matching parenthesis), perform some mutations, and then recreate the input. This unloads all the work to the user, but can be highly effective, particularly for formats where most structural information can be easily inferred (e.g., line based formats). AFLnet follows a similar approach: it uses mutators based on a handful of rudimentary packet boundary parsers for the supported formats.
Nyx’s Affine Typed Bytecode
As mentioned before, Nyx follows the generative approach: the user specifies a set of opcodes that can be chained by Nyx. While the authors only used the tool to fuzz hypervisors, the opcode-based approach can potentially be used to fuzz a wide variety of interactive targets. All the user needs to do is to implement a set of different opcodes (with their respective inputs and outputs).
For example, a network specification handling multiple connections at the same time is shown in Listing 2.2. First, we define a data type (d_bytes) that contains the payload of actual packets. Then, we define a new opcode (or “node” in Nyx’s terminology) that creates a new connection. It takes no inputs and returns a new connection handle (e_con
). Lastly, we define an opcode that emits/sends a single packet via a given connection. To this end, we create a node that borrows a connection and contains a vector of bytes.
Note that the specification that we use for network targets in this paper is even simpler: we usually hook the first connection established via a given port and address. Our agent then delivers packets to each function call that attempts to read data from this connection (e.g. recv() or read()). Similarly, the agent signals readiness when functions such as epoll() or select() try to wait for more data on the given connection. All that remains is to fill out the two opcode handlers with actual C code that establishes a connection and sends the packet. The fuzzer auto-generates a bytecode format and a custom VM that executes the bytecode by calling the corresponding handlers, as well as custom mutators.
[t] [ frame=lines, framesep=2mm, baselinestretch=1.2, fontsize=, stripnl=false, highlightcolor=green!20, highlightlines=999, ]pylistings/network_spec.py
2.3. Hypervisor-Based Snapshot Fuzzing
As explained before, many network applications maintain state between individual test cases or have expensive startup routines. The former reduces reproducibility, while the latter reduces test throughput. It turns out that both problems can be largely avoided by a clever trick: by obtaining a snapshot of the system’s state directly before executing the test case, we can reset the system to a deterministic state after each test. The cost of this reset is independent of startup complexity and only determined by the size of the changes to the state of the system caused by executing the test. For example, while starting Firefox requires to load hundreds of megabyte of code into memory and to initialize all kinds of system APIs, handling a handful of IPC packets will typically only dirty a few hundred kilobytes of memory. AFL++ contains a Linux kernel module that is able to reset the memory and some limited kernel state of target ring-3 processes to increase performance. Agamotto (agamotto, ) and Nyx both implement such a mechanism to create a snapshot of a whole VM and to reset back to this snapshot after each test. This allows efficient and deterministic fuzzing of a whole OS and even hypervisors. As Nyx-Net is based on Nyx, we now give a short introduction into how Nyx captures and reapplies VM snapshots.
Nyx runs the fuzz target inside of a VM controlled by a modified version of QEMU, and executed by a modified build of KVM. QEMU sets up the VM state and emulates devices as needed, while KVM uses hardware virtualization extensions provided by modern CPUs to run the guest OS inside of the VM natively. This setup provides high performance virtualization. Nyx integrates with both QEMU and KVM to take control of the VM, and to reset the state to a given snapshot. The fuzzer uses an agent component within the VM to control the fuzzing cycle: the agent indicates that the target is ready to receive an input and to create a snapshot. Then, the agent passes the input to the target and lastly the agent notifies the fuzzer that the test case was finished. To achieve this, the agent uses so-called hypercalls. Hypercalls are like syscalls but for VMs: they leave the VM context and pass the control to the hypervisor. QEMU then reacts to those events and creates or restores a snapshot.
To take a snapshot, a copy of the physical memory and all device state is created. To revert back to a snapshot, all device state is overwritten by the old state. Similarly, the VM’s physical memory is overwritten by the original memory. Since the physical memory is often large (4GB and upwards), its prohibitively expensive to reset the whole memory. To accelerate this process, both Nyx and Agamotto use a variety of optimizations: both fuzzers track which pages in the VM’s memory have been altered since the start of the execution. This way they avoid overwriting the whole memory in favor of the (few) modified pages.
Modern CPUs provide hardware acceleration features to support efficient tracking of the set of modified pages, the CPU tracks when a page is dirtied during execution. Once a certain amount of pages have been dirtied (typically up to 512 pages), the CPU exits the VM context and informs the hypervisor of the pages that were affected. The hypervisor, in this case KVM, maintains a bitmap of pages that were written to. Both Nyx and Agamotto use this bitmap to selectively reset the VM’s memory. However, Nyx’s extension to KVM also maintains a stack of pages that need to be reset. This allows Nyx to avoid searching the bitmap for pages to be reset after the execution. For some reason, KVM uses 1 byte in the bitmap for each page in the physical memory. As a consequence, for a 4GB VM, Nyx’s stack of dirty pages saves approximately 1MB of memory bandwidth per test case over KVM’s approach. Additionally, Nyx implements a custom reset mechanism for the state of emulated devices that is much faster than QEMU’s native device serialization/deserialization routine.
In the following, we describe the design of Nyx-Net and the rationale behind each choice we made when designing and building it. First, we give a short threat model that details the attack scenarios and surfaces that we concern ourselves with. Then, we present the architecture of Nyx-Net (see Figure 3 for a high-level overview).
3.1. Threat Model
Any network interface is usually a clear security boundary. As such, we mostly target various socket style interfaces (e.g., TCP, UDP, and Unix domain sockets). However, in some context (such as sandboxes), this boundary sometimes also includes shared memory. We assume that the attacker has full control over all data that is being sent to the corresponding interfaces. With real networks, this is obvious: an attacker is usually able to send arbitrary data to any TCP/UDP server or client. In the case of Unix domain sockets and shared memory, we assume that the attacker has gained full control of the sandboxed process and attempts to exploit the higher privileged process controlling the sandbox. As such, the attacker is also able to send arbitrary data via Unix domain sockets or shared memory.
3.2. Towards Efficient Network Fuzzing
Our approach uses hypervisor-based snapshot fuzzing. Hence, the target is running in a customized VM. At the same time, the fuzzer is running outside of the VM to ensure that the fuzzer has full control over the environment. Additionally, this method allows to take a snapshot of the target. During fuzzing, the snapshot is used to quickly reset the whole virtual machine back to a pristine state after each individual test case. This is already an excellent base to fuzz complex targets that communicate via network interfaces: the snapshot ensures that all state of network connections inside the VM is correctly reset between test cases. Even common, complex patterns such as forking a new process for each incoming connection, writing incoming data to a file system, or even a database in another process, are correctly handled.
However, several distinct challenges remain: first, creating new network connections inside of the VM is still a slow process, usually involving dozens of context switches. This severely limits the fuzzing throughput. Second, many network protocols tend to be fundamentally slow: many messages need to be exchanged to reach “interesting” states. Often, a complex handshake has to be performed before data interacting with the actual application logic is exchanged. Last, the message formats are often complex and a precise understanding of each field or value involved is hard to obtain. As such, the purely generative approach used in Nyx is hard to use for network fuzzing. Writing a specification that is precise enough to model all corner cases is cumbersome and would often take significant manual effort. On the other hand, it is usually easy to obtain some traces of the communication with the target. To address these issues and to make fuzzing networking servers efficient and effective, we introduce the following techniques.
3.3. High Performance Networking
To ensure high performance networking and handle the complexity of shared memory based IPC mechanisms, we emulate significant fractions of the relevant functionality. To this end, we implement a library that adds a variety of hooks into existing libc networking functionality. This library is injected into the target (e.g., via LD_PRELOAD or by compiling it directly into the target) and intercepts all relevant calls. During startup of the target program, the hooks track various relevant interactions with the OS: we track which file descriptors are part of the external attack surface and various metadata associated with them. When the fuzzing starts, it runs the Nyx bytecode VM to generate data reaching the target process on each hooked function call. Functions such as read() or recv() on target file descriptors consume data from packets encoded in the test case. More complex APIs such as epoll() are emulated to indicate which file descriptor (fd) is ready to receive data (e.g., which fds are receiving packets next according to the bytecode). This library also ensures that packets are consumed correctly across multiple processes. Synchronizing the state of the bytecode stream is relevant if multiple processes are sharing file descriptors/sockets. For example, forking network servers will usually inherit a recently opened socket from the main process. Similarly, complex IPC protocols often contain the ability to share new file descriptors across existing socket connections.
By emulating network functionality, we gain the following advantages: first, we can precisely identify which data is attacker-controlled and inject our own data at the right places. Stemming from the same features, we can automatically infer the right place to create the initial snapshot. Nyx-Net automatically places the first snapshot after starting the process and directly before the first byte of input data is passed to the target. Second, and maybe even more importantly, we can often run a whole test case without hitting the slow operating system paths handling real network data. We can faithfully emulate network behaviors such as packets being received in discrete chunks. While TCP is a stream based protocol, at a first glance it often looks like TCP reads “the same packets” that were written by a single call to send(). While fundamentally broken, a frightening amount of servers assume that a single call to recv() will never return data from more than one “packet” (e.g., corresponding send() call in the client). The same ability is also needed to properly emulate UDP connections, where packet boundaries are indeed semantic information.
3.4. Fuzzing with Incremental Snapshots
After ensuring network traffic is emulated with high precision and performance, we still face the challenge that in many cases, protocols contain long sequences of messages (and hence complex states). For example, Firefox’s IPC traffic consists of hundreds and thousands of packets containing many kilobytes of relevant data. Even with fast network emulations, the number of such test cases that can be executed per second is strictly limited by the time needed to parse and consume these longs sequences. As an example, assume that we have a stream of 120 packets that we want to fuzz. Further, assume that we are currently fuzzing only the last 20 packets. Each test case will execute the same initial 100 packets over and over again. To overcome this hurdle for fuzzing performance, we add the ability to use incremental snapshots: Nyx starts each new execution from the root snapshot that represents a clean state. Nyx-Net adds the ability to quickly create and remove secondary snapshots after executing fragments of the input. This can be used to shave off common prefixes from the execution by taking an incremental snapshot after the common prefix was executed.
For example, Nyx-Net starts from the root snapshot and executes the first 100 packets of a 120 packet sequence. It then take a secondary “incremental” snapshot that represents the VM state after executing the first 100 packets. Now, Nyx-Net runs a handful (tens to hundreds) of test cases that mutate only the last 20 packets. After each test case, we reset the VM state to the incremental snapshot taken after already processing the initial 100 packets. This process is visualized both in Figure 2 and Figure 4. This saves the time required to handle those packets. As soon as Nyx-Net wants to schedule another input, the incremental snapshot is discarded, and Nyx-Net returns to the root snapshot for the next input.
In some cases (especially slow targets) we have seen test throughput increase by more than 10x. We took great care to ensure that the process of creating this secondary snapshot takes very little time. While this obviously introduced some engineering effort, it allows for great simplicity in other aspects of our design: we only ever keep one additional snapshot around. Creating incremental snapshots is so cheap that storing them would waste space and time. By recreating incremental snapshots on demand, we also avoid more complex structures such as trees of incremental snapshots building on top of each other (agamotto, ). Each time a new input is scheduled for fuzzing, we randomly decide whether to use incremental snapshots for this input (depending on the inputs performance and number of packets). Then, we pick a random packet in the input and create a snapshot after sending the given packet. Finally, we fuzz the remaining packets for a number of times before discarding the snapshot. In our experiments, we have seen that even for short state sequences reusing the snapshot as little as 50 times yields significant performance increases.
A snapshot placement policy determines how the fuzzer selects the point at which it takes an incremental snapshot. To place snapshots, the fuzzer introduces a special “snapshot” opcode that can be injected at arbitrary positions in the input bytecode. The VM then creates an incremental snapshot when this opcode is executed. Nyx-Net utilizes three strategies for snapshot scheduling.
As a baseline, running Nyx-Net without incremental snapshots is equivalent to a policy that always selects the root snapshot.
Taking snapshots towards the end of the input sequence allows the fuzzer to better exploit its incremental snapshot capabilities; in some cases though, taking snapshots earlier enables the fuzzer to backtrack and possibly explore branching paths in the sequence. To this end, we implemented two snapshot placement policies to explore this trade-off. The parameters were empirically determined via small-scale studies and for sequences smaller than four packets, both policies select the root snapshot.
On inputs with more than four packets, the balanced policy chooses the root snapshot in of the cases. Otherwise it selects a random index in the whole (), or only in the second half ().
This policy cycles all available indices for snapshots. The first time an input is scheduled, it creates the snapshot at the end of the input. Each time no new inputs have been found by fuzzing this snapshot for iterations, we place the snapshot one packet earlier. When Nyx-Net-aggressive reaches the smallest index, it starts again from the end of the input.
3.5. Generating Complex Inputs
As mentioned earlier, Nyx allows to express interactive protocols as input languages for fuzz targets by specifying opcodes for each possible interaction with the target. This features makes it easy to adapt Nyx to network fuzzing. Unfortunately, the tool does not support to load existing network traces as seed files. This poses a serious restriction for network fuzzing. For Nyx, the data exchanged with a hypervisor or emulated hardware is usually structurally rather simple (individual pointers, bitfields, etc.). However, the data passed between clients and servers is often deeply nested and precisely modelling all aspects in Nyx’s description mechanism is difficult. To ease the burden on the user, we introduce a way to convert raw packet dumps into Nyx bytecode inputs. This converter consists of a library that consumes Nyx’s format specifications. It uses meta programming to create Python functions for each opcode. When we call those functions, the builder object logs each invocation. Finally, the builder outputs raw bytecode inputs to be used by our fuzzer.
4. Implementation Details
To evaluate the performance of incremental snapshots and selective emulation, we implemented a prototype of Nyx-Net. In the following section, we describe the implementation details of our fuzzer. We begin by describing the intricacies of emulating the network APIs used by real-world software, then we present the challenges and solutions of taking incremental snapshots, and lastly, we discuss turning Nyx’s format specifications into a format that can be used to load complex existing network dumps as seeds.
4.1. Network Emulation
To speed up network targets for fuzzing purposes and to inject our own fuzzing data, we emulate most network APIs. To be able to intercept calls to network APIs, we use an LD_PRELOAD interceptor for common libc functions. Obviously, we intercept common networking APIs such as accept(), recv(), etc. to track network sockets and the data sent to each socket. However, we also emulate related APIs such as the select/poll/epoll interfaces to ensure compliant behavior. We also hook many APIs that operate on file descriptors in general, such as dup() and close() to keep track of aliasing file descriptors that are related to the targeted network connection. For example, the dup family of operations is commonly used to pass file descriptors to child processes. Overall, our code hooks a total of 30 libc functions and consists of roughly 2,000 lines of C code.
4.2. Creating Incremental Snapshots
Nyx only maintains a single root snapshot. Resetting the whole VM to this snapshot is very cheap: on small targets, Nyx is able to reset the VM about 12,000 times per second—about as fast as forking a similarly complex process once. While it is cheap to reset to the root snapshot, creating a root snapshot is expensive because it requires to copy the whole physical memory (often many gigabytes of data).
Nyx-Net extends Nyx’s capabilities by introducing a second level snapshot that is much cheaper to create. This second level snapshot can be used to increase the performance on slow targets by skipping a whole prefix of each test case. Nyx-Net makes taking an incremental snapshot about as cheap as resetting the snapshot once. As a consequence, we do not have to maintain complex data structures to store a set of snapshots like Agamotto. Instead, we simply recreate the snapshot for the current test case whenever needed.
To obtain such a performance, we make use of similar facilities as resetting the original root snapshot. More specifically, we use Nyx-Net’s ability to cheaply report the set of dirtied pages since the root snapshot was taken. The incremental snapshot can use this information to obtain a copy of all relevant memory. We also store another copy of QEMU’s device state. To speed up resetting the VM to the incremental snapshot, we maintain a complete second mirror image of the VM’s physical memory (see Figure 3). However, to avoid creating an expensive copy of all the physical memory, we simply remap the existing root snapshot to a second location as Copy-On-Write pages. This way, the incremental snapshot itself looks like a complete root snapshot without incurring anywhere near the full memory cost. As a consequence, we do not need to check whether to reset pages from the original root snapshot or the incremental one during the VM reset. To create the incremental snapshot, the pages that were dirtied by the execution since the root snapshot are overwritten with the content of the VM’s physical memory. The Copy-On-Write mapping ensures that the original root snapshot remains unchanged.
Before creating another incremental snapshot, these pages are overwritten with the content of the root snapshot. Note that this means we accumulate real copies of pages already present in the root snapshot. In most cases, the executions affect the same memory. In these cases, reusing the existing copies avoids more expensive changes to the page tables. However, in the worst case, this could lead to storing two identical copies of the root snapshot, causing twice the memory usage. To avoid this, we re-mirror the physical memory used in the incremental snapshot to a clean copy of the original root memory every 2,000 snapshots created.
To handle write accesses to emulated disks, Nyx-Net introduces a second caching layer to store dirtied sectors representing incremental snapshots. Like Nyx, we use a hashmap lookup to find sectors in the snapshot, otherwise we fall back to Nyx’s root snapshot.
4.3. Using Incremental Snapshots
One of the core features of Nyx is to allow giving specifications for interactive targets. Each possible interaction is implemented as a small opcode that takes a set of arguments. When the fuzzer emits the opcode, the “agent” component performs whatever actions are requested. The opcode can also produce another set of values that may be used as arguments for future opcodes. Even though we are not using most of the features available in Nyx (e.g., affine types or even arguments/return values), this model is fundamentally a good fit for network fuzzing: the fuzzer is aware of the time dimension of each interaction. That is, the fuzzer knows about individual packets being sent and most importantly knows that packets that were not sent yet have also not affected the program state at all. This is crucial for incremental snapshots: we introduce a special “snapshot” opcode that the fuzzer injects at arbitrary positions in the input stream. When the agent encounters this packet, it requests a snapshot to be taken by a specific hypercall. Afterwards, the fuzzer continues fuzzing starting from the next packet only.
4.4. Creating Seed Files
Since most hypercalls or MMIO accesses in emulated devices follow reasonably simple patterns, Nyx only supported specifying the interaction fully. No support for loading seed inputs exists in Nyx and the fuzzer needs to find all sequences of interactions on its own. This is not viable for network based fuzzing: protocols tend to be much more complex and it becomes prohibitively expensive to model them down to the last byte in Nyx’s specification format. On the other hand, dumping network traffic is easy. As such, loading seed inputs adds tremendous value to fuzzing campaigns. To enable using PCAPs as seed inputs, we extended Nyx’s specification engine with a Python library that allows to create inputs directly from Python code. The library consumes a specification and dynamically creates all function for each node. Each function logs the arguments and returns tracking objects that know which function call returned them. Later, calls that use those tracking objects as input can track where the values they use, were created. This way, the script builds a graph of function calls as well as their arguments and return values. Finally, when calling build(), the graph is serialized into the flat bytecode that Nyx uses. An example seed file for the specification shown in Listing 2.2 can be seen in Listing 4.4.
We use this library in combination with pyshark to turn PCAP network dumps into seed files. To fragment TCP streams into logical packets, we use the same logic that AFLnet uses. While this is some protocol-specific code, the dissectors are usually very simple. For example, one of the more common packet boundary dissector uses the CRLF newline sequence to split the data stream into logical packets.
[t] [ frame=lines, framesep=2mm, baselinestretch=1.2, fontsize=, stripnl=false, highlightcolor=green!20, highlightlines=999, ]pylistings/network_seed.py
4.5. Compile-Time Coverage
Nyx supports only Intel PT to obtain coverage feedback. Yet, if available, compile-time instrumentation as introduced by AFL can be faster and more robust. To ease the use of Nyx-Net on platforms that do not support Intel PT and to improve performance on open-source targets, we enable compatibility with AFL’s compile time instrumentations. The shared memory that contains the coverage bitmap is optionally exposed to the agent. The agent can then redirect AFL’s coverage data to the shared memory that is used by QEMU.
To evaluate the consequences of our design choices, we compare the prototype implementation of Nyx-Net both against baseline performance and other state-of-the-art network fuzzing tools.
As we will see, Nyx-Net outperforms the state-of-the-art in network service fuzzing on almost all targets. On the ProFuzzBench benchmark for network fuzzing, Nyx-Net uncovers more coverage (usually between 10% and 70%). Additionally, it usually reaches the same coverage between 10x to 100x, sometimes even 1000x faster (more details in the Appendix Table 5). In fact, on around half of the targets, Nyx-Net finds more coverage in the first five minutes than AFLnet in 24 hours. Additionally, Nyx-Net managed to find bugs in two targets of ProFuzzBench that no other fuzzers is able to uncover. Lastly, we see that even for simple (in the case of AFL++ with libpreeny) to moderately complex (in the case of AFLnet) targets, the approaches used by existing methods begin to fail in practice. In contrast, Nyx-Net is not only able to handle all targets in the ProFuzzBench suite, it even works for significantly more complex targets such as Firefox’s IPC.
5.1. Evaluation Setup
All experiments were performed on Intel Xeon Gold 6230 CPUs. Each machine had 52 physical cores and 192GB of memory as well as an SSD. When running experiments in parallel, each one was pinned to it’s own physical core. We also disabled hyper-threading do reduce variance in performance. In experiments onProFuzzBench, we used Nyx-Net’s ability to use AFL’s compile-time instrumentation. This way, we can use the same target binary across all fuzzers. Coverage experiments where repeated ten times and checked for statistical significance as recommended by Klees et al. (klees:fuzzEval, ). We compared against AFLnet and AFLnwe in the most recent commits supported by ProFuzzBench (0f51f9e and 6ba3a25). Likewise, we used a recent release of AFL++ (2dac4e7).
libpreeny contains two approaches used to turn network servers into targets suitable for fuzzing with AFL++. The simple one only replaces sockets with stdin by hooking accept(). As this approach is unable to handle most real-world targets, libpreeny also ships a more complex de-sockifying emulator. We found that it was able to handle more of the ProFuzzBench targets. Hence, we chose to use the second, better performing emulation layer (desock.c).
In the first experiment, we compare Nyx-Net against AFLnet and AFL++ in combination with libpreeny’s socket emulation layer. Each individual fuzzing campaign was ran for 24h. We use the public ProFuzzBench test suite. It contains a total of 13 different network services for various types of protocols (from FTP file transfer over VoIP to media streaming). Notably, ProFuzzBench is published and maintained by the authors of AFLnet, and is used to showcase AFLnet’s strength in fuzzing stateful network targets. We used the coverage measurement and reporting features that are part of ProFuzzBench. A full set of final coverage results for all fuzzers is presented in Table 2 and coverage over time is also shown in Figure 5. Note that on some targets AFL++ with libpreeny is unable to even start the service. On most other targets, it makes some initial progress, but as coverage is only measured in five minute intervals, most or all coverage was found within the first five minutes and hence it seems that no coverage is found at all. Additionally, AFLnwe significantly under-performs compared to AFLnet. We therefore excluded both from Figure 5. This results demonstrate how libpreeny is far less powerful than our emulation layer. Similarly, AFLnet and AFLnet-no-state perform almost identical, and we excluded AFLnet-no-state from the plots. A complete figure containing plots of all fuzzers can be found in the Appendix (Figure 7). Overall, Nyx-Net is outperforming AFLnet on all but two targets that show no statistically significant difference.
We also investigated each tool’s ability to find crashes in the targets contained in ProFuzzBench. AFLnet, AFLnet-no-state and AFLnwe all find crashes in the exact same four targets. Similarly, Nyx-Net was able to crash the same four targets. Additionally, Nyx-Net also was able to crash two additional targets. A full list of the crashes uncovered can be found in Table 1.
5.3. Incremental Snapshots
The targets that are part of ProFuzzBench are configured in a way that AFL and its derivatives such as AFLnet and AFLnwe perform reasonably well. To this end, very short seeds with only a handful (e.g., usually less than five) of packets where chosen by the authors. In such a scenario, high performance emulation and snapshot fuzzing make up most of the impact. While incremental snapshots still increase the throughput, they can not add their full potential. To evaluate the impact of incremental snapshot, we hence picked a more complex target with longer runs. Specifically, we demonstrate how incremental snapshots greatly increase the fuzzing throughput when fuzzing the game Super Mario Bros. also used to showcase other fuzzing tools (ijon, ).
We recreate the Super Mario experiment presented in Ijon and demonstrate how Nyx-Net’s incremental snapshots lead to 10x-30x increases over Ijon in time to solve. On all levels, Ijon was the slowest fuzzer. Nyx-Net-None added a modest 4x average speedup (standard deviation 2.4x, min/max: 1x/9.4x). Nyx-Net-Balanced managed to achieve an 5.8x average speedup (standard deviation: 3x, min/max: 1.6x/12.7x), while Nyx-Net-Aggressive found solutions on average 11x faster (standard deviation: 6.8x, min/max: 1.8, 29.8x). The data of all levels can be found in the Appendix (Figure 4). In fact, when fuzzing some of the simple levels on 52 cores in parallel, Nyx-Net is able to find a solution faster than a flawless player optimizing for speed (commonly known as “speedrun”) is able to play the level even once. This unlikely feature is made possible by a combination of factors: most importantly, as can be seen in Figure 2, Nyx-Net’s incremental snapshot allowed the fuzzer to focus only on the difficult part of the current execution by using incremental snapshots right in front of the difficult jump, leading to solve the level 10x – 20x faster than Ijon. Additionally, Ijon’s experiment setup is skipping rendering and removes the framerate limit of 60 FPS.
Lastly, we parallelize fuzzing to 52 cores. All these speedups together allow us to perform tens of thousands of test cases per second. As a consequence, Nyx-Net is able to solve the first level in less than the 26 seconds wall-clock time needed to speedrun the level at normal framerates.
The original Ijon paper mentioned that Ijon was occasionally able to use wall jumps to escape from pits. However, Nyx-Net actually was able to exploit this ability to much greater results: Nyx-Net is routinely able to solve a level (2-1) by exploiting a wall jump glitch. Ijon was unable to find this glitch and the authors of Ijon believed 2-1 might be impossible to solve. Nyx-Net seems to be able to trigger this glitch somewhat regularly (it was found in two out of three of our configurations).
It is important to be able to scale to many cores for fuzzing purposes. Naively parallelizing the fuzzer like Agamotto or Nyx will consume prohibitive amounts of memory (e.g., many 100s or even 1,000s of GBs). We share the root snapshots between different instances. As a consequence, in our experiments, 80 instances of Nyx-Net only require about 2x the memory of a single instance.
To better understand the performance impact of incremental snapshots, we also perform detailed experiments evaluating the performance overhead introduced by our approach. To this end, we used three different policies of Nyx-Net for most experiments: None (only a root snapshot is used), Balanced (we are rather conservative about snapshots), and Aggressive (almost every execution is using snapshots, and we are mostly placing the snapshot close to the end of the input). This allows us to explore the impact of using incremental snapshots. Our experiments on ProFuzzBench (seen in Table 3) show that while snapshots are an additional cost, aggressive snapshot produces the highest execution throughput on all targets. Even the balanced strategy still usually increases throughput. While it also reduces throughput in some cases, the difference is usually smaller than the variance between the different runs. As mentioned before, ProFuzzBench mostly consists of short sequences of inputs. As discussed earlier, when using incremental snapshots on Super Mario, which has longer message sequences, more aggressive snapshots significantly improve the time to solve a target.
Lastly, we compare our implementation of incremental snapshots against Agamotto, another recent fuzzer that was developed to speed up syzkaller with incremental snapshots. We used both Agamotto’s implementation and ours to create and restore incremental snapshots using the base VM image from our network experiments. We varied the number of dirtied pages and measured the time needed both for creating as well as resetting an incremental snapshot. Note that in contrast to the other experiments, this experiment was performed on a Intel Core i7-6700HQ CPU @ 2.60GHz as the larger servers were blocked by more computationally expensive experiments. For creating snapshots, pages were dirtied, a temporary snapshot was created, and pages where dirtied again. Then, the old root snapshot was restored. This experiment was repeated 1,000 times each and the average times were measured. Note that the 500 MB VM was unable to dirty pages as not enough memory could be allocated. The results are show in Figure 6. A few notable results can be observed: first of all, Nyx-Net is almost an order of magnitude faster than Agamotto in the relevant range of dirtied pages. As expected, usually creating/restoring snapshots on smaller VMs is slightly faster. Surprisingly, for both Agamotto and Nyx-Net, restoring large numbers of dirty pages on the 512MB VM is slower than on the 4GB VM. This is due to the fact that allocating a significant fraction of the whole available memory is much more work than allocating the same number of pages if there is plenty of memory. Also, for large numbers of dirty pages, Agamotto becomes marginally faster than Nyx-Net. This is due to the fact that when the number of dirty pages approaches the number of pages available, the size of our dirty stack approaches (and eventually even exceeds) the size of the bitmap used. It should be noted that in contrast to Nyx-Net, Agamotto maintains a tree of snapshots. After one gigabyte is used to store snapshots, Agamotto begins to discard old snapshots causing it to slow down. This state is usually quickly reached during fuzzing (as well as in this experiment). We also performed offline experiments where we aborted before Agamotto would start using its LRU policy to evict snapshots. This increases Agamotto’s performance, as no cleanups are performed. However, the performance was still behind Nyx-Net’s throughput for typical workloads.
Overall, Nyx-Net is significantly faster across the relevant part of the spectrum of snapshot sizes. This observation is related to multiple factors. First, Nyx-Net uses a simpler mechanism: while Agamotto constructs trees of prefixed snapshots, Nyx-Net only uses a single snapshot. Second, Nyx-Net is not iterating the whole bitmap that tracks dirty pages, while Agamotto has to walk the whole bitmap of all pages present in the physical memory of the VM. Last, Nyx-Net also uses faster emulated device resets, reducing the fixed cost of resetting devices.
5.4. Case Study: MySQL Client
After we evaluated Nyx-Net on various server components, we now present a high-level view of using Nyx-Net for testing clients. For this case study, we fuzzed MySQL’s client software that is used to connect to and administrate MySQL databases. Running Nyx-Net requires five steps: (i) obtain the target binary, (ii) pick or create a protocol specification, (iii) obtain seed inputs (optionally), (iv) bundles all required data, and (v) finally run the fuzzer.
In a first step, we obtain a binary of MySQL client to fuzz by compiling the software with AFL’s compiler.
Next, we need to chose or create a format specification. As we do not want to spend the time to learn about this protocol, we simply pick the generic default specification that assumes raw packets.
To gather seed inputs in step three, we use Wireshark to obtain a set of PCAPs. As the capture was taken locally, TCP packets directly correspond to logical packets in the protocol. Hence, we use the generic script to split the PCAP into individual packets used as seed.
The fourth step is to bundle a share folder that contains all relevant data. We use the packer script that copies the target, all of its dependencies, and the seeds into the share folder. It also parses the specification and auto-generates the LD_PRELOAD library that is used as agent component during fuzzing.
In the last step, we run the fuzzer by passing the path to the share folder. The fuzzer automatically loads the VM image, which runs a script that downloads the share directory and runs the target.
Performing these steps yields an out-of-bound read on the current version of the client (as shipped by Ubuntu) after a few minutes of fuzzing on 52 cores. We are currently in the process of reporting this issue.
5.5. Case Study: Lighttpd
We also used Nyx-Net on Lighttpd’s development branch and found a memory corruption issue where a negative amount of memory could be allocated under specific circumstances. We reported the issue and the bug was fixed before being merged into master.
5.6. Case Study: Firefox
5.7. Handling of New Vulnerabilities
We worked closely with a security engineer at Mozilla to understand and mitigate the security impact of the bugs found with Nyx-Net. While our three bugs where only null pointer dereferences (which are still regarded as high severity), the additional two bugs found by Mozilla were exploitable. During the evaluation, we also found one crash in MySQL’s client that affects the version of MySQL currently shipped in Ubuntu. We have not yet had the time to full triage this bug, but will responsibly disclose it to the maintainers after we fully triaged the issue (and before publication of this paper). Lastly, when fuzzing Lighttpd, we also uncovered an integer underflow in malloc that was fixed before it was shipped. Additionally, Nyx-Net managed to find two new bugs in targets from the ProFuzzBench suite. While no other fuzzer in our evaluation found these bugs, it seems like these bugs have been fixed in the latest release.
6. Related Work
Following the publication of AFL (afl, ), its impact soon caused a wave of additional research. Almost every design choice was investigated: AFL’s input mutation algorithm where extended upon (aflsmart, ; nautilus, ; redqueen, ; codealchemist, ; zest, ; weizz, ) as was its ability to trigger and identify bugs (parmesan, ; hotfuzz, ; uaf-fuzz, ; memlock, ; fuzzsan, ; hotfuzz, ; slf, ). To improve the strength of AFL’s semi-random mutations, many researchers proposed to combine fuzzing with more elaborate program analysis techniques such as taint tracking (angora, ; vuzzer, ) and symbolic or concolic execution (drillerfuzzer, ; dynamicinteger, ; dart, ; tfuzz, ; sage, ; qsym, ; taintscope, ; grammarwhite, ; digfuzz, ; dowsing, ; pangolin, ).
Similarly, as the fast coverage guidance is one of the defining features of AFL, it was heavily scrutinized. Additional feedback mechanism where invented and tested for improvements (wangnot, ; retrowrite, ; collafl, ; instrim, ; steelix, ; untracer, ). The last big component of AFL, after mutating and obtaining coverage feedback, is picking which input is fuzzed next. Like the other two components, input scheduling has been investigated thoroughly (ijon, ; entropic, ; scheduling, ; optimizing_seed_selection, ; program_adaptiv, ; aflfast, ; aflgo, ). For a more in-depth overview of recent developments in fuzzing, please refer to Manès et al.’s SoK paper (art-science-of-fuzzing, ). To make feedback fuzzing more applicable in various scenarios, the harnessing was improved. Fuzzers such as syzkaller or kAFL (krace, ; triforce, ; kafl, ; syzkaller, ; usbfuzz, ; agamotto, ) adapted AFL’s fuzzing model to kernel fuzzing. Fuzzers like VDF and Nyx even target hypervisors (vdf, ; hypercube, ; nyx, ). Hypervisor-based fuzzing is also commonly used to fuzz firmware (frankenstein, ; basesafe, ; firmafl, ; halucinator, ) Snapshots were also used previously to speed up fuzzing. Besides Nyx, Falk proposed to use hypervisor-based snapshot fuzzing (cmilk, ). Similarly, snapshots were used to improve fuzzing of Android apps (timetraveling, ). Agamotto even uses incremental snapshots to accelerate kernel-level fuzzing.
In this paper, we present Nyx-Net, an approach to fuzz complex network services with high performance and fidelity. We believe that snapshot-based network fuzzing makes fuzzing significantly easier to use: the user does not have to ensure the absence of artifacts caused by vestigial state from earlier executions. At the same time, our app can also clearly outperform state-of-the-art approaches based on sending data via real network interfaces—often by orders of magnitude. We also found Nyx-Net very easy to use: using Nyx’s support for binary-only fuzzing, we can directly take targets from Ubuntu’s repositories and fuzz test them. Nonetheless, our network emulation is still not 100% accurate in more complex scenarios (e.g., when multiple connection are needed at the same time). As such, we still needed to perform some changes to the agent when fuzzing Firefox’s IPC. A more complete emulation would make Nyx-Net even easier to use, we leave this engineering challenge as future work.
- (1) Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and Daniel Teuchert. Nautilus: Fishing for Deep Bugs with Grammars. In Symposium on Network and Distributed System Security (NDSS), 2019.
- (2) Cornelius Aschermann, Sergej Schumilo, Ali Abbasi, and Thorsten Holz. Ijon: Exploring deep state spaces via fuzzing. In IEEE Symposium on Security and Privacy, 2020.
- (3) Cornelius Aschermann, Sergej Schumilo, Tim Blazytko, Robert Gawlik, and Thorsten Holz. REDQUEEN: Fuzzing with Input-to-State Correspondence. In Symposium on Network and Distributed System Security (NDSS), 2019.
- (4) Domagoj Babić, Stefan Bucur, Yaohui Chen, Franjo Ivančić, Tim King, Markus Kusano, Caroline Lemieux, László Szekeres, and Wei Wang. Fudge: fuzz driver generation at scale. In ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), 2019.
- (5) William Blair, Andrea Mambretti, Sajjad Arshad, Michael Weissbacher, William Robertson, Engin Kirda, and Manuel Egele. HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Guided Micro-Fuzzing. In Symposium on Network and Distributed System Security (NDSS), 2020.
- (6) Marcel Böhme and Brandon Falk. Fuzzing: On the exponential cost of vulnerability discovery. In ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), 2020.
- (7) Marcel Böhme, Valentin JM Manès, and Sang Kil Cha. Boosting Fuzzer Efficiency: An Information Theoretic Perspective. In Joint Meeting on Foundations of Software Engineering, 2020.
- (8) Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. Directed greybox fuzzing. In ACM Conference on Computer and Communications Security (CCS), 2017.
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury.
Coverage-based greybox fuzzing as markov chain.In ACM Conference on Computer and Communications Security (CCS), 2016.
- (10) Sang Kil Cha, Maverick Woo, and David Brumley. Program-adaptive mutational fuzzing. In IEEE Symposium on Security and Privacy, 2015.
- (11) Peng Chen and Hao Chen. Angora: Efficient Fuzzing by Principled Search. In IEEE Symposium on Security and Privacy, 2018.
- (12) Abraham A Clements, Eric Gustafson, Tobias Scharnowski, Paul Grosen, David Fritz, Christopher Kruegel, Giovanni Vigna, Saurabh Bagchi, and Mathias Payer. HALucinator: Firmware Re-hosting Through Abstraction Layer Emulation. In USENIX Security Symposium, 2020.
- (13) S. Dinesh S. Dinesh, Nathan Burow, Dongyan Xu, and Mathias Payer. RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization. In IEEE Symposium on Security and Privacy, 2020.
- (14) Zhen Dong, Marcel Böhme, Lucia Cojocaru, and Abhik Roychoudhury. Time-travel Testing of Android Apps. In International Conference on Software Engineering (ICSE), 2020.
- (15) Brandon Falk. Chocolate Milk. https://github.com/gamozolabs/chocolate_milk. Accessed: November 5, 2021.
- (16) Andrea Fioraldi, Daniele Cono D’Elia, and Emilio Coppa. WEIZZ: Automatic grey-box fuzzing for structured binary formats. In International Symposium on Software Testing and Analysis (ISSTA), 2020.
- (17) Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. AFL++: Combining incremental steps of fuzzing research. In usenix-woot, 2020.
- (18) Shuitao Gan, Chao Zhang, Xiaojun Qin, Xuwen Tu, Kang Li, Zhongyu Pei, and Zuoning Chen. CollAFL: Path Sensitive Fuzzing. In IEEE Symposium on Security and Privacy, 2018.
- (19) Patrice Godefroid, Adam Kiezun, and Michael Y Levin. Grammar-based whitebox fuzzing. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2008.
- (20) Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed Automated Random Testing. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2005.
- (21) Patrice Godefroid, Michael Y Levin, David A Molnar, et al. Automated whitebox fuzz testing. In Symposium on Network and Distributed System Security (NDSS), 2008.
- (22) Istvan Haller, Asia Slowinska, Matthias Neugschwandtner, and Herbert Bos. Dowsing for Overflows: A Guided Fuzzer to Find Buffer Boundary Violations. In USENIX Security Symposium, 2013.
- (24) Andrew Henderson, Heng Yin, Guang Jin, Hao Han, and Hongmei Deng. VDF: Targeted Evolutionary Fuzz Testing of Virtual Devices. In Symposium on Recent Advances in Intrusion Detection (RAID), 2017.
- (25) Jesse Hertz and Tim Newsham. Project Triforce: Run AFL on Everything! https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2016/june/project-triforce-run-afl-on-everything/. Accessed: November 5, 2021.
- (26) Chin-Chia Hsu, Che-Yu Wu, Hsu-Chun Hsiao, and Shih-Kun Huang. INSTRIM: Lightweight Instrumentation for Coverage-guided Fuzzing. In Symposium on Network and Distributed System Security (NDSS), Workshop on Binary Analysis Research, 2018.
- (27) H. Huang, P. Yao, R. Wu, Q. Shi, and C. Zhang. Pangolin: Incremental hybrid fuzzing with polyhedral path abstraction. In IEEE Symposium on Security and Privacy, 2020.
- (28) Kyriakos Ispoglou, Daniel Austin, Vishwath Mohan, and Mathias Payer. Fuzzgen: Automatic fuzzer generation. In USENIX Security Symposium, 2020.
- (29) Yuseok Jeon, Wookhyun Han, Nathan Burow, and Mathias Payer. FuZZan: Efficient Sanitizer Metadata Design for Fuzzing. In USENIX Annual Technical Conference, 2020.
- (30) Jinho Jung, Stephen Tong, Hong Hu, Jungwon Lim, Yonghwi Jin, and Taesoo Kim. WINNIE: Fuzzing Windows Applications with Harness Synthesis and Fast Cloning. In Symposium on Network and Distributed System Security (NDSS), 2021.
- (31) Christoph Kerschbaumer and Christian Holler. Effectively Fuzzing the IPC Layer in Firefox. https://blog.mozilla.org/attack-and-defense/2021/01/27/effectively-fuzzing-the-ipc-layer-in-firefox/. Accessed: November 5, 2021.
- (32) George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. Evaluating Fuzz Testing. In ACM Conference on Computer and Communications Security (CCS), 2018.
- (33) Lafintel. Circumventing fuzzing roadblocks with compiler transformations. https://lafintel.wordpress.com/. November 5, 2021.
- (34) Li, Yuekang and Chen, Bihuan and Chandramohan, Mahinthan and Lin, Shang-Wei and Liu, Yang and Tiu, Alwen. Steelix: Program-state Based Binary Fuzzing. In Joint Meeting on Foundations of Software Engineering, 2017.
- (35) Dominik Maier, Lukas Seidel, and Shinjo Park. Basesafe: Baseband sanitized fuzzing through emulation. In ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2020.
- (36) Valentin Jean Marie Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J Schwartz, and Maverick Woo. The art, science, and engineering of fuzzing: A survey. In IEEE Transactions on Software Engineering, 2019.
- (37) David Molnar, Xue Cong Li, and David Wagner. Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs. In USENIX Security Symposium, 2009.
- (38) Stefan Nagy and Matthew Hicks. Full-speed fuzzing: Reducing fuzzing overhead through coverage-guided tracing. In IEEE Symposium on Security and Privacy, 2019.
- (39) Roberto Natella and Van-Thuan Pham. Profuzzbench: A benchmark for stateful protocol fuzzing. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021.
- (40) Manh-Dung Nguyen, Sébastien Bardin, Richard Bonichon, Roland Groz, and Matthieu Lemerre. Binary-level Directed Fuzzing for Use-After-Free Vulnerabilities. In International Conference on Software Engineering (ICSE), 2020.
- (41) Sebastian Österlund, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. ParmeSan: Sanitizer-guided Greybox Fuzzing. In USENIX Security Symposium, 2020.
- (42) Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. Validity Fuzzing and Parametric Generators for Effective Random Testing. In International Conference on Software Engineering (ICSE), 2019.
- (43) Hui Peng and Mathias Payer. USBFuzz: A Framework for Fuzzing USB Drivers by Device Emulation. In USENIX Security Symposium, 2020.
- (44) Hui Peng, Yan Shoshitaishvili, and Mathias Payer. T-Fuzz: Fuzzing by Program Transformation. In IEEE Symposium on Security and Privacy, 2018.
- (45) V. Pham, M. Bohme, A. E. Santosa, A. Caciulescu, and A. Roychoudhury. Smart greybox fuzzing. IEEE Transactions on Software Engineering, 47(09), 2021.
- (46) Van-Thuan Pham, Marcel Böhme, and Abhik Roychoudhury. AFLNET: A Greybox Fuzzer for Network Protocols. In IEEE International Conference on Software Testing, 2020.
- (47) LLVM Project. libfuzzer. https://llvm.org/docs/LibFuzzer.html. November 5, 2021.
- (48) Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. VUzzer: Application-aware Evolutionary Fuzzing. In Symposium on Network and Distributed System Security (NDSS), 2017.
- (49) Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan M Foote, David Warren, Gustavo Grieco, and David Brumley. Optimizing seed selection for fuzzing. In USENIX Security Symposium, 2014.
- (50) Jan Ruge, Jiska Classen, Francesco Gringoli, and Matthias Hollick. Frankenstein: Advanced wireless fuzzing to exploit new bluetooth escalation targets. In USENIX Security Symposium, 2020.
- (51) Sergej Schumilo, Cornelius Aschermann, Ali Abbasi, Simon Wörner, and Thorsten Holz. HYPER-CUBE: High-Dimensional Hypervisor Fuzzing. In Symposium on Network and Distributed System Security (NDSS), 2020.
- (52) Sergej Schumilo, Cornelius Aschermann, Ali Abbasi, Simon Wörner, and Thorsten Holz. Nyx: Greybox hypervisor fuzzing using fast snapshots and affine types. In USENIX Security Symposium, 2021.
- (53) Sergej Schumilo, Cornelius Aschermann, Robert Gawlik, Sebastian Schinzel, and Thorsten Holz. kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels. In USENIX Security Symposium, 2017.
- (54) Yan Shoshitaishvili. Preeny. https://github.com/zardus/preeny. November 5, 2021.
- (55) Dokyung Song, Felicitas Hetzelt, Jonghwan Kim, Brent Byunghoon Kang, Jean-Pierre Seifert, and Michael Franz. Agamotto: Accelerating kernel driver fuzzing with lightweight virtual machine checkpoints. In USENIX Security Symposium, 2020.
- (56) Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. Driller: Augmenting fuzzing through selective symbolic execution. In Symposium on Network and Distributed System Security (NDSS), 2016.
- (57) Peach Tech. Peach. http://www.peachfuzzer.com/. Accessed: November 5, 2021.
- (58) Dmitry Vyukov. syzkaller: Linux syscall fuzzer. https://github.com/google/syzkaller. Accessed: November 5, 2021.
- (59) Tielei Wang, Tao Wei, Guofei Gu, and Wei Zou. TaintScope: A checksum-aware directed fuzzing tool for automatic software vulnerability detection. In IEEE Symposium on Security and Privacy, 2010.
- (60) Yanhao Wang, Xiangkun Jia, Yuwei Liu, Kyle Zeng, Tiffany Bao, Dinghao Wu, and Purui Su. Not All Coverage Measurements Are Equal: Fuzzing by Coverage Accounting for Input Prioritization. In Symposium on Network and Distributed System Security (NDSS), 2020.
- (61) Cheng Wen, Haijun Wang, Yuekang Li, Shengchao Qin, Yang Liu, Zhiwu Xu, Hongxu Chen, Xiaofei Xie, Geguang Pu, and Ting Liu. Memlock: Memory usage guided fuzzing. In International Conference on Software Engineering (ICSE), 2020.
- (62) Maverick Woo, Sang Kil Cha, Samantha Gottlieb, and David Brumley. Scheduling black-box mutational fuzzing. In ACM Conference on Computer and Communications Security (CCS), 2013.
- (63) Meng Xu, Sanidhya Kashyap, Hanqing Zhao, and Taesoo Kim. Krace: Data Race Fuzzing for Kernel File Systems. In IEEE Symposium on Security and Privacy, 2020.
- (64) Wei You, Xuwei Liu, Shiqing Ma, David Perry, Xiangyu Zhang, and Bin Liang. Slf: fuzzing without valid seed inputs. In International Conference on Software Engineering (ICSE), 2019.
- (65) Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In USENIX Security Symposium, 2018.
- (66) Michał Zalewski. american fuzzy lop. http://lcamtuf.coredump.cx/afl/. Accessed: November 5, 2021.
- (67) Z. Zhang, W. You, G. Tao, Y. Aafer, X. Liu, and X. Zhang. STOCHFUZZ: Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting. In IEEE Symposium on Security and Privacy, 2021.
- (68) Lei Zhao, Yue Duan, Heng Yin, and Jifeng Xuan. Send Hardest Problems My Way: Probabilistic Path Prioritization for Hybrid Fuzzing. In Symposium on Network and Distributed System Security (NDSS), 2019.
- (69) Yaowen Zheng, Ali Davanian, Heng Yin, Chengyu Song, Hongsong Zhu, and Limin Sun. Firm-afl: High-throughput greybox fuzzing of iot firmware via augmented process emulation. In USENIX Security Symposium, 2019.
Appendix A Super Mario Experiment
Appendix B Time to Equal Coverage
|Target||AFLnet time to Final Coverage||Nyx-Net||Nyx-Net-balanced||Nyx-Net-aggressive|