An Architecture for Exploiting Native User-Land Checkpoint-Restart to Improve Fuzzing

Fuzzing is one of the most popular and widely used techniques to find vulnerabilities in any application. Fuzzers are fast enough, but they still spend a good portion of time to restart a crashed application and then fuzz it from the beginning. Fuzzing an application from a point deeper in the execution is also important. To do this, a user needs to take a snapshot of the program while fuzzing it on top of an emulator, virtual machine, or by utilizing a special kernel module to enable checkpointing. Even with this ability, it can be difficult to attach a fuzzer after restoring a checkpoint. As a result, most fuzzers leverage a form of fork-server design. We propose a novel testing architecture that allows users to attach a fuzzer after the program has started running. We do this by natively checkpointing the target application at a point of interest, and attaching the fuzzer after restoring the checkpoint. A fork-server may even be engaged at the point of restoration. This not only improves the throughput of the fuzzing campaign by minimizing startup time, but opens up a new way to fuzz applications. With this architecture, a user can take a series of checkpoints at points of interest, and run parallel tests to reduce the overall state-complexity of an individual test. Checkpoints allow us to begin fuzzing from a deeper point in the execution path, omitting prior execution from the required coverage path. This and other checkpointing techniques are described in the paper to help improve fuzzing.

READ FULL TEXT VIEW PDF

page 3

page 5

page 6

03/13/2018

Reviewing KLEE's Sonar-Search Strategy in Context of Greybox Fuzzing

Automatic test-case generation techniques of symbolic execution and fuzz...
06/08/2018

Badger: Complexity Analysis with Fuzzing and Symbolic Execution

Hybrid testing approaches that involve fuzz testing and symbolic executi...
02/08/2022

SNPSFuzzer: A Fast Greybox Fuzzer for Stateful Network Protocols using Snapshots

Greybox fuzzing has been widely used in stateless programs and has achie...
02/20/2018

ISA-Based Trusted Network Functions And Server Applications In The Untrusted Cloud

Nowadays, enterprises widely deploy Network Functions (NFs) and server a...
03/07/2019

Compositional Fuzzing Aided by Targeted Symbolic Execution

Guided fuzzing has, in recent years, been able to uncover many new vulne...
11/05/2018

Out-Of-Place debugging: a debugging architecture to reduce debugging interference

Context. Recent studies show that developers spend most of their program...
06/04/2018

Minimizing Event-Handling Latencies in Secure Virtual Machines

Virtualization, after having found widespread adoption in the server and...

1 Introduction

As software grows, we need quality, flexible tools to analyze and find bugs in the application before deployment. There are various techniques out in the market for testing. These include SMT [4], symbolic execution [3], and software fuzzing.

A widely used technique in practice is fuzzing. In this paper, we use the AFL fuzzer [11] (American Fuzzy Lop) for all the testing purposes. AFL is a widely adopted fuzzing framework that has been leveraged to find many bugs in large commercial applications. In practice, it has helped attackers and defenders find exploits in the software at least as commonly as other technique mentioned above. One of the reasons for its popularity is its simple architecture. The program is (optionally) recompiled with instrumentation, a set of seed inputs are provided, and AFL randomly manipulates the seeds before providing them to software under test. The instrumentation provides AFL feedback by way of branch-coverage data. This has made it attractive for test any software which operates on common user-input (such as file contents).

Fuzzers use various tricks and techniques to increase overall test-throughput and speed up the testing of applications. They rapidly generates different randomized inputs and execute them on the software under test, and leverage mechanisms such as fork/clone to rapidly produce new copies of the program to bypass program startup costs.

While fuzzers are fast, they still can be improved. A problem that almost all the fuzzers face is of re-initializing the target software after each successful test or after a crash. This means a good amount of time spent in an initialization phase. If we can eliminate this start-up time, we can ultimately speed up the fuzzing process. In addition, some tests require a multitude of interactions before user inputs are truly tested. In some cases, it can be difficult or impossible to reproduce these interactions in a programmatic manner without significant overhead.

One of the ways to improve fuzzing is to checkpoint the target after it has been initialized. In each test run, we can restart or fork/clone it from the checkpointed state. This not only provides a significant improvement over the conventional way of fuzzing, but we can now begin tests deeper into the execution of the program. Additionally, by taking a series of checkpoints and fuzzing each one of them separately, we can increase the overall coverage of a program in a parallel manner, rather than spending significant time modeling specialized input/tests cases to reach that further point of execution.

Conventionally, checkpoint/restart has been done by utilizing specialized kernel modules or by running applications on top of an emulator/virtual machine. Utilizing specialized kernel modules limits the ability of checkpointing to specific environments, kernel versions, and may or may not support fuzzing of state-full programs (such as ones which use the network). Under emulation, one takes a complete snapshot of the virtual machine. And at each iteration, one restarts from that snapshot. Although this solves re-initialization, taking a snapshot and restoring the complete memory of the virtual machine is expensive. In addition, running an application on a virtual machine decreases the performance.

In this paper, we propose the use of DMTCP (Distributed Multi-Threaded Checkpoint), a software package for transparently checkpointing applications without modification of the target application. This is the basis for a checkpoint-restart framework for fuzzing. In this paper AFL is used as an example, however DMTCP could be used with any fuzzer to provide quick checkpoint-restart.

Moreover, DMTCP has a plugin architecture [2], which helps to easily modify the behavior of DMTCP. This can further help AFL in providing extra instrumentation for the target application, such as trapping on system/library function calls. By using this instrumentation, fuzzers can improve upon their feedback mechanism. Further, plugins not only provide instrumentation, they can also make clever decisions while taking checkpoints. This keeps the checkpoint logic separate from the fuzzing logic for ease of maintenance, while letting them still communicate with each other.

2 Background

In this section, we discuss the background of AFL and DMTCP.

2.1 Afl

Fuzzing is a mechanism for testing correctness of software [6]. It can also be used for finding exploits.

AFL [10]

is one of the most popular fuzzing package in use. It has had much success in general applications. It uses clever genetic algorithms to randomize the input to quickly find new interesting states in the software. AFL consists of various parts. Here, we concentrate on: input file; shared memory; forkserver; afl_maybe_log; and fuzz test child.

Figure 1: AFL

Input file

In a trivial setup, AFL replaces the stdin of the target program with a file, known as the input file. So, instead of reading input from stdin, the target program reads the input from the input file. When the program finishes executing, AFL modifies the contents of the input file before beginning a new test. This pattern is followed throughout the complete fuzzing process.

afl_maybe_log

This is a function that contains the implementation of forkserver and all the logic for recording the code coverage. It is injected during recompilation after the call to main() and before each branch instruction. On first invocation, it initializes the forkserver or disable instrumentation if a forkserver is not found (this enables the user to run the program without AFL attached). It then communicates with the fuzzer about the status of the forked child. On each branch hit, it updates the shared memory.

Forkserver

In a basic approach for fuzzing, a fuzzer would fork and exec the target application. This approach is simple, but it has a high overhead. Each exec() calls incurs significant linking and library initialization overhead. To alleviate this, AFL utilizes a forkserver design. On the first call to afl_maybe_log(), the application creates a communication pipe with AFL. When AFL sends a “go” command to initiate a test, the instrumentation forks the program with copy-on-write, which is very quick and lightweight (some newer AFL version may allow the use of clone() to reduce even more overhead). The forkserver (parent application) now calls waitpid() on the child and sends control information back to AFL via the communication pipes. This complete setup helps reduce the high setup overhead time. A more complex design is required to push distributed fuzzing jobs to large server environments.

Shared memory

AFL uses a shared memory buffer to collect branch coverage of the target application during tests. The forkserver is responsible for mapping this buffer, and children inherit a copy of this mapping for use with each subsequent afl_maybe_log() call. After each execution of the ”fuzz test child”, AFL reads the shared memory buffer and tweaks the input file accordingly. The shared memory is then reset before the next test.

Fuzz Test Child

The fuzz test child is the forked child of the forkserver. “fuzz test child” executes from where the program left of (main() in the trivial setup), and all the branches executed by it are logged by afl_maybe_log. afl_maybe_log updates the shared memory. The shared memory is read by AFL, to see the coverage. Based on the coverage input, AFL modifies the current input file for the next “fuzz test child”. The program will utilize the input file as its stdin for all future reads from stdin, this is what enables fuzzing the program.

2.2 Distributed Multi-Threaded Checkpoint (DMTCP)

DMTCP [1] is a software package that has traditionally been used to checkpoint and restart HPC applications [5]. DMTCP can transparently checkpoint applications, with no modification to the target application or the operating system. It works for centralized and distributed computations. It does not require any special user privileges. It operates entirely within user space. It can be used with MPI, Matlab, Python, Perl, and a series of other packages.

DMTCP also includes a plugin architecture [2]. DMTCP provides a plugin architecture which can be used in two ways. First, it provides event hooks on the events provided by DMTCP. Second, it provides wrappers around functions via the LD_PRELOAD functionality. A DMTCP plugin is in the form of a shared library. So, it can be dynamically loaded.

Figure 2: Architecture of DMTCP

Using DMTCP allows us to wrap or replace library and system calls. This can be used to provides extra instrumentation or direct the usage of fuzzed input to certain function calls. In this work, we use this for detecting patterns of system calls or library calls by the applications. Wrappers are placed around read, write, and seek in the examples in this work.

DMTCP also provides event hooks. Using the event hooks, we can do additional or specialized setup during testing. For example, we can reset kernel state (replaying a set of system calls) or reset global variables before restart so that they appear the same as prior to checkpoint.

Using event hooks and function wrappers together in DMTCP provides process virtualization [2]. Process virtualization is needed after the program has been restarted when certain system resources will differ (for example, each child will have different process ids). Wrapper functions and event hooks can (and are) used to pass a virtual process id to the target application instead of the real process id. The application might have cached the earlier process ids. It may want to use those cached process ids again, for example to send a signal. DMTCP hides this by creating wrappers around all library or system calls that use a process id. See [2] for details.

This functionality can be extended to wrap any dynamically loaded library via the plugin system. For example, it could be utilized to recreate a specific network state should the program require an open session to resume from that point in time.

3 Design

In this section, we describe the technical design details of how to marry AFL and DMTCP together. We do this in three steps:

  1. We first run the (instrumented) application without AFL under dmtcp_launch.

  2. When the program reaches the desired state, we request a checkpoint.

  3. Then we restart the process with AFL attached.

AFL will only be used during the third phase. During the first two phases, the program can interacted with and DMTCP is free to collect administrative data (such as network or file session details) that are needed to successfully restart a checkpoint.

Launching under DMTCP

Figure 3: Launch under DMTCP

Diagram 3 describes how an application is launched under DMTCP. The target application is compiled using afl-gcc. afl-gcc is a wrapper around gcc, which basically inserts the fork server and branch coverage instrumentation and the fork server. The target application is then run under dmtcp_launch. After executing the command, dmtcp_launch connects with the DMTCP coordinator. A DMTCP coordinator is created by dmtcp_launch if one is not already there. The DMTCP coordinator is used to enable external entities to control the checkpointing of a program.

Then dmtcp_launch does some setup and execs into the target application: System Under Test (SUT). Since we compiled the application SUT with afl-gcc, it contains instrumented branches. On the first instrumented branch, afl_maybe_log tries to run the AFL forkserver. If there is any kind of failure (in this case, AFL is not available), then afl_maybe_log sets a failure flag to ABORT, and aborts any further setup. Subsequent calls to afl_maybe_log do nothing and they just return. The program continues executing as-if it wasn’t instrumented.

The application can optionally make calls to dmtcp_checkpoint() to request a checkpoint. For our testing purposes, this call has been added directly to a target application, although the checkpoint can also be triggered externally. This sends a request to the DMTCP coordinator. The coordinator checkpoints the application and the checkpoint image is saved to a file.

Notably, to support AFL, we must launch DMTCP and the application with an AFL-aware DMTCP plugin, which virtualizes the effects of afl_maybe_log() and makes it possible to attempt re-initialization after restarting from a checkpoint.

Restarting with AFL attached

Figure 4: Restart with AFL

Diagram 4 describes how an application is restarted with AFL attached. At the time of restart, we run AFL. However, instead of running AFL directly on the application, we substitute this for the checkpointed application using dmtcp_restart:      dmtcp_restart ckpt_image.dmtcp where ckpt_image.dmtcp is the checkpoint image that was saved. AFL first makes a call to shm_get(), and sets up the shared memory. Then, AFL forks and execs into the dmtcp_restart command.

Next, dmtcp_restart launches the DMTCP coordinator, and restores (restarts) the checkpointed process. The checkpoint image being restored by dmtcp_restart contains the special AFL plugin (from the command line of dmtcp_launch), which resets the afl_maybe_log() failure flag back to zero (its initial state).

Finally, dmctp_restart passes the control to SUT (the target application). SUT contains multiple calls to afl_maybe_log. The next call to afl_maybe_log starts the forkserver, which sets up the shared memory, and connects to AFL. After all of this is done, it forks a new “fuzz test child”, and monitors its return status to determine if the process exits normally or if it crashes. The forkserver passes this information back to AFL, so that the fuzzer is aware of the status of the process being fuzzed.

3.1 Strategy for DMTCP plugins to control the instrumented binary

The plugin architecture of DMTCP can be used provide more interesting forms of instrumentation without directly modifying the system under test. This helps us get a better understanding of the program, and we can propagate this information to the fuzzer. Based on this, the fuzzer can generate a more informed input. Using the plugin architecture of DMTCP, we can support checkpoint fuzzing by implementing it separately from the mutation engine (in this case, AFL). This separation of concerns is important for both development and for maintainability.

In the current design, the SUT must be compiled into an instrumented binary. This involves injecting the AFL forkserver code into the binary, and instrumenting all the branch instructions of the binary to monitor the code coverage.

The compiled binary is then run under dmtcp_launch, and without AFL attached. Without the fuzzer, the binary runs normally as if there was no instrumentation. We then call dmtcp_checkpoint to checkpoint the binary.

We restart the checkpointed binary under AFL-fuzz, using dmtcp_restart. dmtcp_restart includes a plugin that resets the forkserver state, and attempts to re-initialize the fork-server.

Next, our restarted binary operates as if DMTCP isn’t there. The logging and coverage information is sent back to AFL from the restarted binary, as if it were running normally without any restart.

This shows that the restarted application can run successfully under dmtcp_restart with AFL attached. This opens up many possibilities, including the example of the next section.

3.2 Example: Three plugins to monitor the system under test

Here, we propose how three custom plugins can be created for monitoring a specific system under test. These plugins checkpoint the application, analyze it, and reset the restarted application.

We first describe a plugin for checkpointing. As we have described in the DMTCP plugin architecture of Section 2.2. We can checkpoint the application according to a predefined pattern, and we can express that pattern in the context of a DMTCP plugin. For example, consider an application with a pattern of initializing itself with 5 calls to read() which are known to contain no bugs. By checkpointing after the first 5 reads, this eliminates this overhead. This does not require modifying the original binary, so it can even be used with dumb-fuzzing mode (no instrumentation).

We next describe an analysis plugin. An analysis plugin is would hook a variety of system and library calls to determine when a the program produces a previously unseen execution pattern. The plugin can then checkpoint at each function call, providing a new system state from which to begin fuzzing from.

Last, we describe a reset plugin. A reset plugin is a plugin used to reset the AFL forkserver, and resets any resources (such as file descriptors or network connections) that are needed at the time of attaching AFL to the restarted process.

3.3 Implications: a distributed fuzzer tree

Here, we describe how the DMTCP plugins can potentially be used to build a distributed fuzzer tree. First we start fuzzing with a random input seed. Then on each newly discovered pattern of library/system calls, we create a checkpoint using the checkpoint plugin. Each checkpoint image file can be restarted and fuzzed independently. Each checkpoint image represents a distinct node in an “execution state tree”.

We continue to take checkpoints based on newly detected patterns, using the analysis and checkpoint plugins. And each time we restart, the reset plugin resets the forkserver code in the binary to the appropriate state.

Importantly, since DMTCP can transparently virtualize resources (such as file descriptors and network sessions), fuzzing campaign developers needn’t worry about writing custom code to handle those cached resources. This would make automated execution state exploration significantly easier.

4 Related work

AFL-QEMU [9] is an extension of the original AFL fuzzer [11] to do black-box fuzzing of closed-source binaries. It runs QEMU in user-mode to minimize the overhead of running an application over a virtual machine. User-mode QEMU just emulates a CPU and translates all the system calls to the native OS syscalls and avoids the overhead of emulating all the h/w resources and a full kernel. QEMU has built-in support to take a snapshot of the complete virtualized machine, and restore it later in time. Even after using QEMU with user-mode, it has significantly higher overhead than a native execution of the target program.

Another work employs LLVM-based instrumentation so that AFL-fuzz provides compiler-based instrumentation for compiling it into assembly [7]. This helps optimize the code that is used at runtime. It also makes the instrumentation CPU-independent.

For our purposes, the most interesting application of LLVM-based AFL-fuzz is that we can attach AFL after AFL-init. This helps in resetting the forkserver, and running from there on.

Another approach to improving fuzzing is to use in-memory checkpoints [8]. The fuzz time is improved by taking a snapshot of the process and replicating it, instead of calling fork. This is similar to a checkpoint in terms of reading the complete memory and saving it, but it does it in RAM instead of saving to secondary storage. This improves the fuzz time of the application.

5 Conclusion

As described in the paper, one can easily checkpoint and restart an application after some initialization setup and still dependably execute library and system calls with virtualized resources. The software architecture presented here would improve the performance of the fuzzer by some constant, whose magnitude is to be determined in a future implementation. Also, it was shown that DMTCP’s plugin architecture can help extend coverage capabilities of fuzzer, and thus help improve the code coverage or state exploration, while decreasing the overall execution time. DMTCP’s distributed architecture can also allow us to extend this approach to distributed fuzzers in future work.

Hence, the new architecture provides a simple, yet powerful tool, which can transparently apply checkpoint fuzzing to target applications, once those applications have been compiled using afl-gcc.

References

  • [1] J. Ansel, K. Arya, and G. Cooperman (2009) DMTCP: transparent checkpointing for cluster computations and the desktop. In 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS’09), Rome, Italy, pp. 1–12. Cited by: §2.2.
  • [2] K. Arya, R. Garg, A. Y. Polyakov, and G. Cooperman (2016) Design and implementation for checkpointing of distributed resources using process-level virtualization. In IEEE Int. Conf. on Cluster Computing (CLUSTER’16), pp. 402–412. Cited by: §1, §2.2, §2.2.
  • [3] R. Baldoni, E. Coppa, D. C. D’elia, C. Demetrescu, and I. Finocchi (2018) A survey of symbolic execution techniques. ACM Computing Surveys (CSUR) 51 (3), pp. 1–39. Note: also see: https://arxiv.org/pdf/1610.00502.pdf Cited by: §1.
  • [4] C. Barrett (2008) SMT solvers: theory and practice. In Summer School 2008: Verification Technology, Systems & Applications, Note: https://resources.mpi-inf.mpg.de/departments/rg1/conferences/vtsa08/slides/barret2_smt.pdf Cited by: §1.
  • [5] J. Cao, K. Arya, R. Garg, S. Matott, D. K. Panda, H. Subramoni, J. Vienne, and G. Cooperman (2016) System-level scalable checkpoint-restart for petascale computing. In 22nd IEEE Int. Conf. on Parallel and Distributed Systems (ICPADS’16), pp. 932–941. Note: also, technical report available as: arXiv preprint arXiv:1607.07995 Cited by: §2.2.
  • [6] V. J. M. Manès, H. Han, C. Han, S. K. Cha, M. Egele, E. J. Schwartz, and M. Woo (2019) The art, science, and engineering of fuzzing: a survey. IEEE Transactions on Software Engineering. Cited by: §2.1.
  • [7] L. Szekeres et al. Fast LLVM-based instrumentation for afl-fuzz. Note: https://github.com/google/AFL/tree/master/llvm_mode Cited by: §4.
  • [8] W. Xu, S. Kashyap, C. Min, and T. Kim (2017) Designing new operating primitives to improve fuzzing performance. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2313–2328. Note: https://gts3.org/assets/papers/2017/xu:os-fuzz.pdf Cited by: §4.
  • [9] M. Zalewski et al. Google/AFL/qemu_mode (github). Note: https://github.com/google/AFL/tree/master/qemu_mode Cited by: §4.
  • [10] M. Zalewski American fuzzy lop (2.52b). Note: https://lcamtuf.coredump.cx/afl/ Cited by: §2.1.
  • [11] M. Zalewski Technical whitepaper for afl-fuzz. Note: https://lcamtuf.coredump.cx/afl/technical_details.txt; also see: https://lcamtuf.coredump.cx/afl/ Cited by: §1, §4.