UniFuzz: Optimizing Distributed Fuzzing via Dynamic Centralized Task Scheduling

09/14/2020 ∙ by Xu Zhou, et al. ∙ 0

Fuzzing is one of the most efficient technology for vulnerability detection. Since the fuzzing process is computing-intensive and the performance improved by algorithm optimization is limited, recent research seeks to improve fuzzing performance by utilizing parallel computing. However, parallel fuzzing has to overcome challenges such as task conflicts, scalability in a distributed environment, synchronization overhead, and workload imbalance. In this paper, we design and implement UniFuzz, a distributed fuzzing optimization based on a dynamic centralized task scheduling. UniFuzz evaluates and distributes seeds in a centralized manner to avoid task conflicts. It uses a "request-response" scheme to dynamically distribute fuzzing tasks, which avoids workload imbalance. Besides, UniFuzz can adaptively switch the role of computing cores between evaluating, and fuzzing, which avoids the potential bottleneck of seed evaluation. To improve synchronization efficiency, UniFuzz shares different fuzzing information in a different way according to their characteristics, and the average overhead of synchronization is only about 0.4%. We evaluated UniFuzz with real-world programs, and the results show that UniFuzz outperforms state-of-the-art tools, such as AFL, PAFL and EnFuzz. Most importantly, the experiment reveals a counter-intuitive result that parallel fuzzing can achieve a super-linear acceleration to the single-core fuzzing. We made a detailed explanation and proved it with additional experiments. UniFuzz also discovered 16 real-world vulnerabilities.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Software vulnerability, which can cause serious consequences, is a significant threat to information systems [1, 42]. In recent years, serious vulnerabilities in modern software and OS exploited by attackers ramp up. Security analysts from large companies and research organizations leverage a huge amount of resources to discover vulnerabilities from their products, which is usually resource-hungry, time-consuming, and labor-intensive. Among various of techniques, fuzzing is the most efficient and practical approach to detect software vulnerabilities, which is widely used to test multi-threaded programs [53, 37, 16], libraries [2], kernel code [32, 35, 18], protocols [46, 54], and smart contracts [17, 40].

Fuzzing usually generates massive random test cases to run the target program and monitors running crashes to report vulnerabilities. To thoroughly test a program, we need to generate countless test cases, and for each test case, we have to execute the program at least once [22, 26]. This process is very computing-intensive, making fuzzing usually cost many hours of computation, even days or months. For situations where a program is urgent to be released to the public, it is always unacceptable. Therefore, fuzzing requires noticeable performance improvement to meet the timeliness of vulnerability detection.

Most researches improve fuzzing performance by designing novel algorithms [19, 7, 51]. These algorithms gain performance promotion by optimizing the core mechanism of fuzzing, including seed generation [38, 39, 15, 44], mutation strategy [8, 21, 45, 44], seed prioritization [48, 5, 39], etc. However, the performance improved by algorithms is limited. Referring to the evaluation of some prevalent AFL-based fuzzers (such as Fairfuzz [19], AFLFast.new [7], and FidgetyAFL [51]), on average, they only get an efficiency promotion of around 15% [19] comparing to original AFL. And this figure not only varies with different fuzzres, but also presents a radical float on distinct target programs. For example, FairFuzz gets a promotion of 13.6% when exercising tcpdump, while FidgetyAFL merely gets 3.8% [19]. Moreover, the performance of FairFuzz targeting at xmllint is better than AFL, up to 32.7%, while only 12.9% when it comes to objdump [19].

Another research direction to increase fuzzing performance is utilizing parallel computing resources to process fuzzing workloads concurrently. Since fuzzing workloads do not have much data dependency, we can foresee a great performance increasement in parallel fuzzing. For example, Google’s OSS-Fuzz leverages more than 25,000 machines that process ten trillion test inputs a day on an average and has found about 16,000 bugs in Chrome and about 11,000 bugs in over 160 open source projects during the past two years 

[4]. For the ideal situation, we can increase 100% performance by doubling the computing resources. However, in reality, the performance of parallel fuzzing is limited by four major challenges.

(1) Task conflicts. The straightforward design of parallel fuzzing is instantiating multiple fuzzing instances and sharing interesting seeds among different fuzzing instances, such as the parallel mode of AFL (noted as AFL-P), and the inter-machine version Roving [30] and disfuzz [3]. However, in such an architecture, different fuzzing instances may mutate the same seeds and generate redundant (identical) test cases, causing task conflicts, which will lead to severe waste of computing resources. To alleviate task conflicts, some works, such as P-fuzz [34] and PAFL [24], try to synchronize fuzzing information among fuzzing instances, which introduces the second challenge.

(2) Synchronization overhead. In parallel fuzzing, various fuzzing information can be shared among fuzzing instances to increase fuzzing performance, such as seeds, coverage, hangs, crashes, etc. However, the runtime overhead of synchronization inevitably deducts the overall performance, and such performance deduction gets severe with the increase of fuzzing instances. Thus, what information to share and how to share is another challenge in parallel fuzzing, especially in a distributed environment. To alleviate the performance deduction, some work [30, 3, 10, 34] adopt periodical synchronization to achieve a tradeoff between effectiveness and efficiency; P-fuzz [34] uses a centralized architecture based on a database to facilitate the synchronization. To some extent, the requirement of synchronization limits the system scalability.

(3) Scalability in a distributed environment. Most state-of-the-art parallel fuzzers are limited to a single-machine mode, such as AFL-P, PAFL [24], and EnFuzz [10], which has poor scalability. This is because they rely on the file system on a single machine to share seeds among fuzzing instances. Another factor that limits scalability is the way that fuzzing tasks are distributed. For example, PAFL [24] separates seeds by grouping branches in the bitmap and assigning them to different fuzzing instances, limiting the number of fuzzing instances. Besides, with the increasing of fuzzing instances, large bunches of new seeds can over-burden the system, and the seed evaluation (deduplication) can be a bottleneck of the system, which also limits the scalability.

(4) Workload imbalance. When dispatching fuzzing tasks to different fuzzing instances via a static strategy, the workload imbalance is another challenge. For example, PAFL [24] assigns workloads by grouping branches according to the coverage bitmap. P-fuzz [34] marks seeds with flags and timestamps and assigns seeds to different fuzzer instances for testing until the instances get killed or exit unexpectedly. Such static distribution strategies can alleviate task conflicts by making fuzzer instances have different seeds. However, they are incapable of adjusting seeds among fuzzing instances. Besides, such strategies do not take into account the computing capacity. For example, AFL-P and PAFL [24] are only designed for the multi-core parallel fuzzing within a single machine, which assumes the computing capability of all cores are equal and assigns equal workloads to each core. However, for a distributed fuzzing environment, each computing core might have a different computing capability. Such a static strategy might aggravate the workload imbalance.

To overcome the above challenges, we present UniFuzz, an optimization for distributed fuzzing via a dynamic centralized task scheduling. UniFuzz evaluates and distributes seeds in a centralized manner to avoid task conflicts and increase performance. It collects the new seeds from all the fuzzing instances to filter out the duplicates and sort them based on the evaluation. To overcome the scalability challenge, UniFuzz utilizes a dynamic task distribution instead of dividing tasks with a fixed strategy. UniFuzz dynamically distributes tasks to fuzzing instances with a “request-response” scheme according to their computing capability, which avoids imbalance. To improve the performance of synchronization, UniFuzz adopts different schemes for different fuzzing information. It shares the light-weight fuzzing status directly via the database and maintains a local map in each fuzzing instance to cache the seeds, which avoids copying every time. For the rapidly changing coverage information, UniFuzz uses a reconstruction scheme by dry-running the corresponding seed, which avoids occupying the bandwidth of the database synchronization. To overcome the evaluation bottleneck, UniFuzz classifies the computing cores into

evaluating nodes and fuzzing nodes. Evaluating nodes filter out duplicates by hash value and coverage information, while the fuzzing nodes run fuzzing instances to run the tests. The number of each group dynamically changes according to the number of seeds to be evaluated. With new paths triggered, and many new seeds that need evaluation constantly uploaded, the number of evaluating nodes increases; otherwise, the number of fuzzing nodes increases to reinforce testing. The adaptive coordinating of the two-node groups can avoid the scalability problem and achieve the best fuzzing performance. In summary, we make the following contributions in this paper:

We propose a novel centralized dynamic task scheduling for parallel fuzzing in a distributed environment. By separating scheduling from fuzzing, the scheduler can dynamically distribute fuzzing tasks to fuzzing instance efficiently, which avoids task conflicts and workload imbalance. We propose a dynamic evaluating instance expanding scheme to switching the role of computing cores between evaluating and fuzzing, which overcomes the evaluation bottleneck and improves the overall system performance. To improve synchronization efficiency, we use three hierarchies of information sharing based on the characteristics of different fuzzing information.

We implement our design and evaluate it on large-scale experiments. We implement a tool called UniFuzz and conduct experiments on 12 real-world programs with computing cores ranging from 1 to 128. Results show that UniFuzz has better performance than state-of-the-art tools such as AFL, PAFL an EnFuzz. The overhead of synchronization is about 0.4%. UniFuzz also discovered 16 vulnerabilities from real-world programs and Google fuzzer-test-suite.

We achieve a counter-intuitive super-linear acceleration. Intuitively, parallel fuzzing should only get a sub-linear acceleration. For example, parallel fuzzing with four nodes for one hour only theoretically has the same performance as fuzzing with one node for four hours. In reality, the former should perform less good as the latter owing to the cost of synchronization and task distribution. However, our experiments show that the optimized parallelism of fuzzing can overcome this limitation. We made a detailed explanation and proved it with additional experiments.

The rest of the paper is organized as follows: Section 2 reviews the background knowledge of parallel fuzzing and AFL. Section 3 introduces the design of centralized dynamic task scheduling and illustrates the key techniques we adopt. Section 4 introduces how we implement UniFuzz. Section 5 presents the evaluation of UniFuzz. Section 6 discusses the performance of Unifuzz and explains the experiment results. Section 7 surveys related works and is followed by conclusions.

Ii Background

Shared data Synchronization
Program Seeds Crashes Hangs Fuzzer_stats Coverage sync method sync time Inter-machine Task conflicts
AFL-P [50] share memory update after fuzz_one()
Roving [30] master node periodical update
disfuzz [3] master node periodical update
PAFL [24] share memory periodical update
Li et.al. [43] master node periodical update
Clusterfuzz [12] cloud resource
EnFuzz [10] share memory periodical update
P-fuzz [34] database periodical update
UniFuzz database live update
TABLE I: Comparison of parallel fuzzers

Ii-a American Fuzzy Lop

AFL (American fuzzy lop) [50] is the state-of-the-art coverage-based greybox fuzzer, and many state-of-the-art greybox fuzzers [6, 25, 48, 47] are built on top of it. AFL uses lightweight instrumentation to capture basic block transitions and gain coverage information during run-time. Then it selects a seed from the seed queue and mutates the seed to generate testcases. If a testcase exercises a new path, it is added to the queue as a new seed. AFL favors seeds that triggered new paths and give them preference over the non-favored ones. Compared to other instrumented fuzzers, AFL has a modest performance overhead. The feedback-driven coverage mechanism of AFL is stated in Algorithm 1. The process of AFL testing includes the following steps: (1) AFL selects a seed from the seed list based on their priority information; (2) AFL assigns energy (i.e., the number of times to mutate the seed) to the seed and mutates the seed to generate a batch of test cases (inputs); (3) AFL tests the program with the generated cases, and collects the coverage information; (4) AFL will add a test case into the seed queue if the test case finds a new state.

0:  Initial Seeds Set
  
  
  repeat
      Choose from
       AssignEnergy()
      for  from to  do
           Mutate()
           Execute()
          if  == CRASH then
              add to
          else if IsInteresting(then
              add to
          end if
      end for
  until timeout reached or abort-signal
  
Algorithm 1 AFL’s feedback-driven coverage mechanism

Edge coverage. AFL obtains the execution trace and calculates the edge coverage by instrumenting the PUT at compile time. It inserts random numbers for each branch jump at compile-time and collects these inserted numbers from the register at run-time to identify the basic block transition. Edge coverage is more delicate and sensitive than basic block coverage as it takes into account the transition between basic blocks. It is also more scalable than path coverage as it avoids path explosion.

Seed prioritization. AFL leverages the edge-coverage information to select seeds. It maintains a seed queue and fuzzes the seed within it one by one. It labels some seeds as “favored” when they execute fast and are small in size. AFL uses a bitmap with edges as keys and top-rate seeds as values to maintain the best performance seeds for each edge. It selects favored seeds from the top_rated queue and gives these seeds preference over the non-favored ones by giving the favored one more fuzzing chances.

Mutation strategies. AFL has two categories of mutation strategies: deterministic strategies and non-deterministic strategies. First, the deterministic strategies leverage mutators based on bit-flip, arithmetic, token, dictionary, and interest values to mutate the seeds with different granularity sequentially. After doing deterministic strategies, AFL introduces non-deterministic strategies, including the havoc stage and splice stage. In the havoc stage, AFL mutates the seed by randomly choosing a sequence of mutation operators from the deterministic strategies and apply them to random locations in the seed file. As a result, the generated testcase is significantly different from the original seed. Then, AFL uses the splice strategy to randomly choose another seed from the seed queue and recombine it with the current seed to generate a new seed. Then, the havoc strategies are re-implemented to the new seed.

Power schedule. In the deterministic stage, mutation strategies are involved sequentially, but in the non-deterministic stage, AFL can assign energy to the seed to decide the fuzzing chances of each seed. The energy is assigned according to the performance score of each seed, which is based on coverage (prioritize inputs that cover more paths), execution time (prioritize inputs that execute faster), and discovery time (prioritize inputs discovered later). Particularly, if the testcase exercises a new path, AFL will double the energy assigned.

AFL also supports the parallel mode to improve the testing efficiency, where AFL can utilize multi-core to test the programs in one machine. In the parallel mode, each instance of AFL binds a core and periodically re-scan the top-level sync directory for any test cases found in other instances. According to this, AFL realizes the sharing mechanism of seeds between different cores. Considering these advantages, we determine to realize our design based on AFL.

Ii-B Parallel Fuzzing

Parallel fuzzing has been proposed for years. It evolves from a naive method, i.e., simply running multiple fuzzer instances with the same target simultaneously. Some take this one step further, and they share seeds between these fuzzer instances. That is how the parallel mode of AFL, Roving [30], and disfuzz [3] work. The improvement of the naive method is the multi-core parallel fuzzing. They start to schedule tasks between fuzzing instances to alleviate task conflicts. However, their parallelism is limited by the number of cores in a machine. Distributed Fuzzing extends parallelism to multiple computing machines connected by a network, which can utilize more computing resources.

Details of some representative parallel fuzzers are shown in Table I. The main difference among these fuzzers lies in the shared data type, whether it supports multi-machine, whether there exist task conflicts, and how to synchronize data. For parallel fuzzing, all these works are trying to answer the following three questions:

What information to share? Naive approaches just share seeds among fuzzer instances. For example, AFL-P parallelizes the fuzzing job by simply initializing multiple instances with -M or -S command-line arguments. Each instance checks the top-level sync directory if there are any new interesting seeds discovered by others and keeps the seed queue synchronized with the main node. By merely sharing seeds, they can make multiple fuzzing instances work together. However, sharing seeds is not enough. Liang et al. [24] find that the advanced optimizations of AFL in parallel mode do not perform well as excepted, though these fuzzers have better performance than the original AFL. This is because of the task conflicts. Thus, the fuzzing status information should also be shared to filter out the duplicates. Based on this observation, Liang et al. develop the guiding information synchronization mechanism to share information such as path frequency [28] and the branch hit count [19], etc. EnFuzz [10] shares seeds as well as fuzzing status. Differently, EnFuzz shares information among diverse fuzzing instances (e.g., AFL, AFLFast, FairFuzz, QSYM, and libFuzzer). Thus, the fuzzing status has to be common data (such as coverage information) that can be identified by different fuzzing instances.

How to share the information? It depends on whether it is multi-core parallelism or distributed parallelism. For multi-core parallel fuzzing on a single machine, they can leverage file system, shared-memory, and semaphores for data sharing. In the parallel mode of AFL, a fuzzer instance periodically scans another fuzzing instance’s directory for newly discovered seeds and copies them to its local seed directory. PAFL [24] leverages shared memory to share fuzzing status. For the distributed parallel fuzzing, data sharing is carried out by network transmission or database. For example, P-fuzz [34] proposes a database-centric architecture, which uses a key-value database to share fuzzing information such as seeds and coverage bitmap. Each fuzzing core reads and writes the database for communication. However, the runtime overhead of information sharing among different fuzzing instances inevitably deducts the overall performance, and the way of information sharing limits the scalability.

How to distribute the workload? Naive approaches do not provide a task scheduling mechanism. Instead, by sharing seeds, a fuzzing instance can share some work of other fuzzing instances. However, a seed being fuzzed by multiple instances means redundant work. PAFL and P-fuzz schedule tasks using the shared fuzzing status. For example, PAFL divides coverage bitmap to several regions and assigns each fuzzing instance a region. By doing so, they can distribute seeds to different fuzzing instances according to their coverage region to avoid task conflict [24]. However, this distribution is static and can not be changed once fuzzing is started. As a result, this can cause workload imbalance. In P-fuzz, a seed is dynamically assigned to just one fuzzing instance. Thus, it has the same load imbalance problem.

Iii Design

To overcome these challenges, we propose a distributed fuzzing optimization based on a centralized dynamic task scheduling strategy. In this section, we give an overview of our design and present the key techniques we adopted.

Fig. 1: The design of distributed fuzzing with a centralized dynamic task scheduling.

As Fig. 1 shows, the architecture of our design consists of four components: a scheduler, a database, the fuzzing instances, and the evaluating instances. As a distributed system, the computing cores are divided into four kinds. The main node works as a scheduler, a node holding the database to share the fuzzing data, the nodes evaluate the new seeds and filter out the duplicates, and the nodes to conduct the fuzzing execution. It worth noting that the role of the evaluating nodes and the fuzzing nodes can dynamically switch when necessary to alleviate the over-burden of seed evaluation and achieve the best performance.

  • The Scheduler is built in the main node, responsible for scheduling evaluating nodes, sorting the seeds in a queue, processing requests from the fuzzing instances, and dispatching fuzzing tasks. We will introduce the scheme of scheduling in section III-B.

  • The Database is used to store and share fuzzing data (e.g., seeds, fuzzing status). Each fuzzing instance connects to the database to synchronize fuzzing information and seeds. We will introduce the synchronization scheme in section III-C.

  • The evaluating instance works to process the new seeds to filter out the duplicates from the database. The evaluating instances are dynamically switched from the fuzzing instances based on the requirement of the evaluating tasks. We will introduce this scheme in section III-D.

  • The fuzzing instance is responsible for running test cases and mutating seeds. They download task seeds assigned by the scheduler from the database and upload new seeds to the database. A majority of the computing cores are used to run fuzzing instances.

Both of the evaluating instances and the fuzzing instances are connected with the scheduler and the database. The scheduler dispatches evaluating tasks to evaluating instances and dispatches fuzzing tasks to fuzzing instance. Meanwhile, the evaluating instances remove the duplicate seeds from the database, and the fuzzing instances download unique seeds from the database. This centralized scheduling separates task scheduling from fuzzing, which can alleviate the task conflict problem and extend single-machine parallel fuzzing to multiple machines in a distributed environment.

In the rest of this paper, we define the following concept to facilitate the description of our approach.

  • Fuzzing task, the amount of workload the scheduler assigns to a fuzzing instance. It contains two kinds of information: the index of seed (i.e., the hash value) and an integer indicating how many times the seed should be mutated (i.e., the energy).

  • Fuzzing status, which determines the priority of seed and times it should be fuzzed. The fuzzing status is evaluated mainly by times it has been fuzzed, including the depth—the generation of the seed from the initial seed, handicap—fuzzing cycles the queue has been done, and bitmap_size—the number of seeds’ bits used for mutation. The detailed calculation is based on calculate_score() in AFL.

Iii-a Workflow of Our Design

Fig. 2: Working procedure of centralized task scheduling.

To optimize the performance of parallel fuzzing, we propose to use a centralized dynamic task scheduling to filter out the duplicate seeds and extend single-machine parallel fuzzing to the multi-machine mode in a distributed environment. Specifically, the centralized dynamic task scheduling is realized by a scheduler and a high throughput database. As Fig. 2 shows, the whole process starts from the scheduler downloading seeds from the database and prioritize them in a seed queue. Then, the fuzzing tasks are distributed in a “request-response” manner. Each fuzzing node sends a request to the scheduler for a fuzzing task. The scheduler responds to the fuzzing node with a task specification (i.e., the fuzzing task), including the seed index and the number of mutations. With the seed index, the fuzzing instance downloads the seed from the database and conducts the fuzzing tests. Once a new seed is discovered during the tests, the fuzzing instance will upload the seed with its corresponding fuzzing status to the database. Meanwhile, the scheduler allocates evaluating instances according to the number of seeds to be evaluated. Evaluating instances download seeds from the database to filter out duplicates and remove them from the database.

Iii-B Centralized Task Scheduling

  
  
  while True do
      if receive from fuzzing instances then
          
          
          
      end if
      for  in  do
          
          
      end for
      if receive from fuzzing instance then
          
          
          
          
          
      end if
  end while
Algorithm 2 The working procedure of the scheduler

We use a centralized task scheduling scheme to dispatch fuzzing tasks to different fuzzing instances. The task dispatching is handled by a scheduler in the main node. From the perspective of the scheduler, each time a new seed is discovered by the fuzzing instances, the seed will be stored in the database and evaluated by the evaluating instance. Then, the scheduler prioritizes seeds in a seed queue according to their fuzzing status (i.e., how important a seed is and how many times it has been fuzzed). On the side of a fuzzing instance, the task dispatching is based on a dynamic contending scheme. Each fuzzing instance would request for a fuzzing task as soon as it is free. The scheduler stores the requests in a queue. Then it selects the seed with the highest priority and responses it to the next fuzzing request in the request queue. Under this scheme, important tasks would be done first. The working procedure of the scheduler is shown in Algorithm 2.

Iii-C Information Synchronization Scheme

For parallel fuzzing, information synchronization has always been a challenge. We have to take into account both the effectiveness and efficiency when synchronizing information among fuzzing instances. In our design, we use a database instead of the file system to synchronize the fuzzing information. This is because the file-based synchronization does not scale well in a multi-machine mode [42].

We mainly synchronize three kinds of information among fuzzing instances: seeds, fuzzing status, and coverage information. According to the characteristics of each information kind and how it is used, we propose three schemes to synchronize the information, and meanwhile, keep a low-performance deduction. Experiments show that our tool can scale to at least 128 nodes with an average synchronization overhead of only 0.4%.

Fuzzing status direct sharing. The fuzzing status is used by the scheduler to prioritize seeds for scheduling fuzzing tasks. Although the fuzzing status is stored along with the seeds in the database, fuzzing status is much different from seeds, and we separate the use of fuzzing status from the corresponding seeds. The size of a seed is usually several KB or more, while the size of fuzzing status is much smaller; Seeds are used by fuzzing instances and can be updated incrementally, while the scheduler only uses fuzzing status; Seeds are constant once they are written in the database, while the fuzzing status is updated each time the seed is fuzzed, which is much more often. Thus, for such light-weight but often-used data, we share it directly via the database. The scheduler dispatches tasks based on the fuzzing status instead of the real seeds. It only maintains a light-weight queue to prioritize the seeds, which will greatly reduce the scheduler’s network pressure when dispatching fuzzing tasks.

Seed caching. For seeds that are relatively heavy-weight and constant, we reduce the synchronization overhead via a local cache. We generate a hash value and use it as an index to identify the seed in the database. Then, we maintain a local map (key-value hash map) as a cache of seeds in the memory of each fuzzing instance. When scheduling a fuzzing task, since each fuzzing instance is assigned explicitly with which seed to fuzz, we use the local map to only retrieve the miss-hit seed from the database to avoid copying every time. Specifically, we modify the functions to read and write seeds in AFL to redirect the seed accesses. A seed read will refer to the local map first and only download the seed from the database when it does not exist in the local map. Similarly, the seed write will upload the seed to the database and maintain a copy in the local map at the same time. We abandon the seed queue for the local seeds in each fuzzing instance and move the scheduling scheme to the scheduler.

Coverage reconstruction. In AFL, coverage information is manifested by a bitmap compressed as high-density raw data (e.g., the 64KB bitmap in AFL). We do not share coverage information directly because the coverage information changes rapidly and frequently. For example, AFL may alter the bitmap on each execution of a test case. Besides, it is easy to cause conflicts when multiple fuzzing instances update coverage information simultaneously. Thus, we use a scheme called bitmap reconstruction to share coverage information among fuzzing instances. Reconstruction means to dry-run the PUT with seeds once again. Each time a fuzzing instance gets a new seed, it re-executes the program with the seed to update its local bitmap coverage. In this way, the bitmap is reconstructed by each fuzzing node independently without conflicts, and the coverage information is easy to be acquired by each fuzzing instance. Obviously, the overhead of reconstruction is trivial. More importantly, coverage reconstruction is scalable.

Instant synchronization. Different from periodical synchronization in other works [30, 3, 43, 10, 34], our approach achieves instant synchronization. Every time a new seed is discovered and uploaded to the database, along with the fuzzing status, both are instantly accessible to all the fuzzing instances. Every time it is needed for the coverage information, it can be reconstructed from the seed without waiting for synchronization.

Iii-D Dynamic Seeds Evaluation

A bottleneck of centralized task scheduling is the heavy burden of seed evaluation when many new seeds need evaluating. The experiment shows that seeds evaluation pressure can depress up to 50% performance when the path number of the PUT is beyond 40K. To alleviate such limitation, we propose to shifting some of the seeds evaluation work from the scheduler to professional evaluating nodes. Moreover, such evaluation nodes are dynamically allocated according to the number of seeds to be evaluated.

Dynamic evaluation expanding. Whenever a fuzzing instance uploads new seeds to the database, the scheduler will receive an “update” signal and initiates an evaluating thread to deduplicate seeds and terminates the thread when the seeds are to be evaluated come to none. However, when too many new seeds flood into the scheduler, the scheduler will alleviate the over-burden by shifting some fuzzing instance to temporarily evaluate these new seeds, namely dynamically converting fuzzing instances to evaluation instances. The invocation of the dynamic evaluation expanding scheme depends on the number of unevaluated seeds in the scheduler. The scheduler checks the number of unevaluated seeds and adjusts the number of evaluating instances at intervals. We use a threshold to control the interval. Intuitively, a lower threshold will invoke the adjustment easily and frequently, which will waste the computing resources. In comparison, a higher threshold will cause new seeds to heap up and depress the overall performance. We empirically set this threshold as 1000 request cycles, which means the scheduler rechecks the number of seeds to be evaluated and adjusts the evaluating instances every time the scheduler has received 1000 “update” signals.

As Algorithm 3

shows, the number of evaluating instances that would be expanded is estimated by the number of

new_seeds_to_evaluate divides the evaluate_speed. evaluate_speed is a dynamic statistical value based on the average number of seeds that have been evaluated in a second. We use unique_rate as an estimation of the previous deduplication performance to expect how many duplicate seeds would be removed, which can help to adjust the number of evaluating instances. For each evaluating instance that has been expanded, the flag is changed from “fuzzing” to “evaluating”, indicating the instance would request seeds from new_seeds_queue to evaluate rather prioritized_seed_queue to fuzz. If the number of seeds in new_seeds_queue decreases is lower than the threshold, the flag would be reset to “fuzzing” to shrink the evaluating instances.

  
  while True do
      if receive from fuzzing instances then
          
      end if
      if  then
          
          
          if  then
              
          end if
          if  then
              
          end if
          
      end if
      for each in  do
          
      end for
  end while
Algorithm 3 Dynamic evaluating instance expanding

Filter out duplicate seeds. In a parallel fuzzing system, different fuzzing instances may mutate the same seed and generate redundant (identical) test cases, causing task conflicts, which will lead to severe waste of computing resources.

An effective way to avoid such a situation and improve fuzzing performance is filtering out duplicate seeds. In our design, we conduct deduplication in two steps. First, we use the hash value of seeds to remove identical seeds. This method is simple and efficient. However, it is ineffective to identify different seeds that exercise the same execution path. To solve this problem, in the second step, we use the bitmap to compare the coverage. By reconstructing seeds’ coverage information, we can distinguish whether it triggers a new path. We discard new seeds that do not extend coverage and keep the rest. In our approach, the first step is cheap and suitable for most cases, while the second step is expensive but can refine the deduplication to a deep level.

Iv Implementation

Based on the design of the dynamic centralized scheduling, we implement a distributed fuzzing tool called UniFuzz. UniFuzz is built on top of AFL (version 2.52b) by adding 3500 lines of C code. The scheduler, which controls instances to alter between fuzzing and evaluating, is implemented in C language. It communicates with fuzzing instances based on TCP, which is realized by socket and directs instances to do fuzzing or evaluating. Since the socket communication based on TCP is blocked, it is unable to handle the instances’ concurrent requests. For solving this problem, we apt for select (an interface of system call) to handle the requests from multi-instances, which performs better in current processing with non-blocking. Besides, we call pthread (a multiple thread library) to initiate new threads to evaluate the seed queue and use “lock” to maintain status information, avoiding multiple threads’ competition for data. The database is built on MongoDB, which is used to store seeds as well as their fuzzing status. All of the other parts, including scheduler, fuzzing instances, and evaluating instances, use libmongoc to interact with the database. The fuzzing instances are implemented based on AFL. The evaluating instance is self-implemented by C-language.

We will make the source code of UniFuzz and the raw data of our experiment publicly available online after the double-blind review process, hoping to foster future research on this topic.

V Evaluation

In this section, we evaluate UniFuzz from aspects including path coverage, acceleration efficiency, the capability of exposing crashes, and synchronization overhead.

Target Programs. We select 12 real-word programs from Google fuzzer-test-suite and other papers, which are shown in Table II. The programs we select are relatively large, facilitating the comparison of different tools in large-scale parallel fuzzing before coverage saturation. The types of file formats handled by these programs are plentiful, including ELF, MP4, TIFF, and PDF.

Subjects Version Format
boringssl @@ 2016-02-12 lib
freetype @@ 2017 font
libcxx @@ 2017-01-27 lib
libxml @@ libxml2-v2.9.2 xml
re2 @@ 2014-12-09 lib
size @@ Binutils-2.34 elf
readelf -a @@ Binutils-2.34 elf
avconv -y -i @@ -f null Libav-12.3 mp4
infotocap @@ ncurses-6.1 text
pdftotext @@ /dev/null xpdf-4.02 pdf
tiff2bw @@ /dev/null tiff-4.1 tiff
ffmpeg -i @@ ffmpeg-4.1.3 mp4
TABLE II: The configuration of target programs

Compared tools. Among the known parallel fuzzers listed in Table I, most are uncomparable due to various reasons. AFL-P [50], PAFL [24], and EnFuzz [10] are limited to single-machine mode. EnFuzz [10] and disfuzz [3] are unworkable. Roving [30] crashes during execution. PAFL [24] is close-sourced. Clusterfuzz [12] cannot provide status data (e.g., paths or seeds) to conduct a comparison. To conduct a relatively fair comparison, we select AFL as a baseline and self-implement PAFL and EnFuzz to conduct a comparison. However, since PAFL and EnFuzz cannot support multi-machine mode, we propose a criterion called computation resource, which is the product of computing cores and testing hours. We define one unit of computation resource as a core multiplied by an hour. In this way, for tools that do not support multi-machine mode, when testing cores beyond a single machine, we extend the fuzzing time to compensate the cores. For example, a 128-core test can be equally replaced by a 16-core machine running for 8 hours.

Configuration. We use UniFuzz, PAFL, and Enfuzz to fuzz each program for 4, 8, 16, 32, 64, and 128 units of computation resources, respectively. Each experiment is repeated three times to reduce the effects of randomness. Particularly, we select the single-mode of AFL to run for 4, 8, 16, 32, 64, and 128 hours as the baseline. Moreover, for the five programs from Google fuzzer-test-suite, we choose the seeds provided by them as the initial seeds. For the other programs, we select the corresponding seeds from the testcases directory in AFL. Our experiments were conducted on machines of an Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz with 32 cores, running a 64-bit CentOS Linux release 7.7.1908 (Core). Notably, the total time of our experiments is more than 1512 CPU days.

V-a Path Coverage

Fig. 3: Comparison of path coverage reached by different tools with the same computation resources.
Program 4 units 8 units 16 units 32 units 64 units 128 units
boringssl 1,100(1.06x) 1,162(1.02x) 1,233(1.02x) 1,411(1.03x) 1,457(1.00x) 1,544(1.04x)
freetype 2,895(1.14x) 3,230(1.16x) 3,566(1.09x) 5,644(1.35x) 8,601(1.37x) 13,221(1.64x)
libcxx 3,985(0.96x) 5,260(1.16x) 5,840(1.21x) 8,740(1.59x) 10,458(1.79x) 11,230(1.81x)
libxml 3,031(1.82x) 3,589(1.90x) 4,314(1.97x) 6,042(2.13x) 6,567(1.81x) 7,599(1.89x)
re2 3,919(1.11x) 4,359(1.06x) 4,625(1.00x) 4,956(0.99x) 5,104(0.95x) 5,134(0.93x)
size 1,387(1.49x) 2,015(1.38x) 2,212(1.21x) 2,972(1.27x) 3,288(1.13x) 3,762(1.12x)
readelf 2,962(1.36x) 4,338(1.46x) 5,990(1.46x) 7,831(1.24x) 11,443(1.21x) 13,423(1.24x)
avconv 3,205(0.85x) 3,701(0.92x) 3,983(0.88x) 4,518(0.84x) 5,999(0.96x) 8,143(1.10x)
infotocap 1,158(0.82x) 1,644(0.85x) 2,293(0.91x) 2,871(0.95x) 3,887(1.21x) 4,544(1.30x)
pdftotext 1,321(1.34x) 1,414(1.09x) 1,455(1.01x) 1,621(1.06x) 1,695(1.06x) 1,826(1.11x)
tiff2bw 1,018(0.75x) 1,652(0.87x) 2,757(1.09x) 3,009(1.11x) 3,161(1.07x) 3,585(1.15x)
ffmpeg 2,766(1.00x) 3,082(0.96x) 5,647(1.44x) 6,126(0.91x) 7,445(0.89x) 10,457(0.95x)
Average increase 1.09x 1.13x 1.19x 1.19x 1.20x 1.28x
TABLE III: Path coverage increasement of UniFuzz compared to baseline with same computation resources.

We use path coverage as the main criterion to measure the performance of these fuzzing tools. Fig. 3 plots the average number of paths discovered by these tools throughout 3 runs in different computation resources. From Fig. 3, the path coverage reached by each tool is rising as the increase of computation resources. For example, for the baseline AFL-single-core, it found 2,538, 2,786, 3,270, 4,173, 6,257 and 8,044 paths in 4, 8, 16, 32, 64 and 128 units of computation resources in freetype.

Among these tools, UniFuzz reaches the highest path coverage in most 10/12 programs (e.g., readelf, ffmpeg, and size), outperforming the other three tools. Particularly, compared with the baseline AFL-single-core, UniFuzz performs better than AFL-single-core on eight programs with the same computation resources, such as on boringssl, freetype, libcxx, and readelf. However, the other two tools do not reach the higher path coverage than that of AFL-single-core. Especially PAFL, AFL-single-core discovers more paths than PAFL on almost all programs in the same computation resources. In fact, PAFL and EnFuzz have advantages over UniFuzz from two aspects. First, PAFL and EnFuzz optimize the scheduling algorithm or mutation strategy, which makes them more efficient than UniFuzz in schedule and mutation. Second, though, based on an equal computation resource comparison, PAFL and EnFuzz are actually running with fewer cores and more time than UniFuzz due to the single-machine limitation, which alleviates the overhead of large-scale parallelism. Thus, the reason that UniFuzz outperforms PAFL and EnFuzz is the optimization in the parallel mechanism, particularly the overhead of synchronization.

Program 4 units 8 units 16 units 32 units 64 units 128 units
boringssl 12.99M(0.95x) 25.17M(0.98x) 35.90M(0.82x) 92.43M(1.05x) 123.10M(0.94x) 139.23M(0.78x)
freetype 11.99M(1.53x) 23.87M(1.48x) 35.28M(1.10x) 93.17M(1.44x) 180.53M(1.38x) 409.08M(2.05x)
libcxx 27.57M(1.37x) 47.22M(1.62x) 44.12M(1.06x) 108.76M(1.53x) 134.50M(1.19x) 115.62M(0.70x)
libxml 17.12M(4.20x) 30.55M(3.69x) 51.53M(3.15x) 105.75M(3.27x) 178.02M(3.24x) 289.90M(3.96x)
re2 16.53M(1.31x) 25.09M(1.02x) 34.61M(0.76x) 66.69M(0.70x) 79.08M(0.38x) 96.89M(0.28x)
size 17.68M(1.59x) 34.97M(1.60x) 42.50M(1.05x) 110.44M(1.37x) 213.81M(1.37x) 277.56M(1.15x)
readelf 19.55M(1.36x) 38.91M(1.38x) 92.73M(1.68x) 130.27M(1.21x) 248.50M(1.16x) 374.85M(1.23x)
avconv 1.17M(1.30x) 2.60M(1.49x) 4.80M(1.49x) 5.67M(0.92x) 14.84M(1.20x) 45.20M(1.97x)
infotocap 3.44M(0.90x) 11.72M(1.78x) 16.84M(1.45x) 31.61M(1.00x) 50.70M(1.01x) 138.80M(1.81x)
pdftotext 3.59M(1.07x) 6.53M(1.05x) 10.76M(1.12x) 14.12M(0.90x) 34.67M(1.27x) 79.30M(1.95x)
tiff2bw 7.56M(0.30x) 31.67M(0.69x) 91.99M(1.09x) 108.93M(0.90x) 180.48M(0.96x) 220.82M(0.83x)
ffmpeg 3.75M(0.70x) 7.81M(0.89x) 17.62M(0.96x) 20.96M(0.63x) 39.97M(0.64x) 104.70M(0.88x)
Average increase 1.16x 1.28x 1.19x 1.19x 1.10x 1.13x
TABLE IV: The number of test cases generated by UniFuzz compared to AFL-single-core with same computation resources

Furthermore, to compare UniFuzz with AFL-single-core in detail, we list the detailed path coverage of UniFuzz in Table III. Particularly, we calculate the improved ratio of the path coverage reached by UniFuzz and that reached by AFL-single-core. From Table III, UniFuzz reaches higher path coverage than AFL-single-core on most programs from 4 units to 128 units. Moreover, with the increasing of computation resources, the average speedup of UniFuzz to AFL-single-core increases from 1.09x to 1.28x, which demonstrates a super-linear performance acceleration in UniFuzz. We will explain this in the next subsection.

V-B Super-linear Performance Acceleration

In this subsection, we try to explain the super-linear acceleration of UniFuzz to AFL-single-core. Generally speaking, if we only expand AFL to a parallel model without optimizing the scheduling algorithm and mutation operator of AFL, theoretically, the parallel mode won’t perform as good as AFL-single-core with the same computation resources (i.e., expanding the execution times). The reason is that parallelism always introduces additional overhead (e.g., seeds synchronization and task distribution) regardless of the optimization to the parallel mode. As a result, with the same testing resources, the computation resources used on fuzzing seeds in the parallel mode are less than those in AFL-single-core. Therefore, the test cases produced by the parallel mode of AFL should be less than those produced by extending the execution time of AFL-single-core.

Based on this assumption, we analyze the average number of test cases generated by UniFuzz and AFL-single-core on all programs with different computation resources, which is listed in Table IV. From Table IV, we can conclude, with the same computation resources, averagely, UniFuzz generates more test cases than AFL-single-core or at least the same, which is not consistent with our inference.

Thus, we propose a hypothesis to explain the above result. In the design of AFL, when it fuzzes a seed with the random strategy and finds a new path, the energy (i.e., the number of mutation chances for each seed) assigned to this seed will be doubled. Though UniFuzz does not optimize the scheduling algorithm and the mutation operators in the code-level (i.e., the energy assignation and mutation strategy of UniFuzz are the same as those in AFL), the parallel mechanism of UniFuzz allows it to test different seeds at the same time. Since these seeds are selected and marked as favored by the scheduler when UniFuzz finds new paths during fuzzing these seeds, the energy assigned to these seeds will be doubled, which can promote generating more test cases to reach new states. For example, for seeds , , assigning energy , to them, they may discover the same path with generating test case . In AFL, it selects seeds orderly by fuzzing after . Once has discovered path and doubles the energy, when discovers path again, the energy will not be doubled again. So the total energy assigned to these two seeds in AFL is . In contrast, UniFuzz can test these two seeds concurrently. For and , when UniFuzz mutates them in different fuzzing instances to generate test cases and both discover path , the energy allocated to and will be both doubled. Therefore, the total energy assigned to these two seeds of UniFuzz is , which is more than that of AFL. In other words, UniFuzz can give favored seeds more energy by the parallel mechanism.

Fig. 4: The distribution of energy assigned to each seed and the times of doubling energy in fuzzing readelf.

To verify this hypothesis, we analyze the energy allocation of UniFuzz and AFL-single-core during fuzzing readelf with 32 units of computation resources. We record the energy assigned to each seed and the times of doubling energy during implementing random strategy. Fig. 4 shows the energy distribution and the doubling energy times of UniFuzz and AFL-single-core. Since lots of seeds are not fuzzed (i.e., the energy assigned to these seeds are 0), we only focus on the seeds that have been fuzzed and sort them by the energy assigned. From Fig. 4(a), we can conclude, for seeds whose serial numbers are close to 0, AFL-single-core and UniFuzz both allocate high energy to them. The reason is that these seeds are regarded as favored and their performance scores are higher than other seeds, which means these seeds are more likely to discover new paths than the others. For these seeds, particularly whose serial numbers are less than 500, UniFuzz allocates much more energy to them than AFL-single-core, which improves the efficiency of energy utilization. Therefore, UniFuzz reaches a higher path coverage than AFL-single-core in readelf by giving more energy to high-quality seeds.

Moreover, from Fig. 4(b), the times of doubling energy on most seeds in UniFuzz are more than those in AFL-single-core. We also report the total times of doubling energy during UniFuzz and AFL-single-core fuzzing readelf

, which are 1,501 and 544, respectively. The total times of UniFuzz doubling energy are 3x more than that of UniFuzz, which is consistent with our hypothesis. This demonstrates that UniFuzz assigns more energy on the seeds with a high probability of finding new paths than AFL-single-core. In summary, parallel mechanism can optimize AFL’s energy scheduling from a global perspective.

V-C Crashes Exposure Capability

During the experiments, UniFuzz triggers plenty of unique crashes in the testing programs, which proves its capability of exposing bugs. We count the average number of unique crashes triggered by different tools with 128 units of computation resources and list them in Table V. From Table V, in general, UniFuzz outperforms the other three tools on boringssl, libcxx, size, avconv, tic and pdftotext, which states that UniFuzz is more efficient than other three tools in detecting crashes.

In more detail, we analyze these crashes and seeds by AddressSanitizer and Valgrind and discover some vulnerabilities in these programs. For the vulnerabilities found in the programs of Google fuzzer-test-suite, we compare the vulnerabilities with the information provided by Google fuzzer-test-suite. Table VI has listed these vulnerabilities, where Y denotes that this vulnerability has been exposed by Google fuzzer-test-suite. According to this, UniFuzz finds 7 and 1 vulnerabilities in libcxx and boringssl, respectively, which are not reported by Google fuzzer-test-suite.

On other programs, UniFuzz discovers 8 vulnerabilities, which are listed in Table VI. For example, UniFuzz finds 3 vulnerabilities in size. One is a heap-use-after-free vulnerability, which is triggered when calling bfd_section_from_shdr function. One is triggered when calling _bfd_coff_free_symbols function to attempt to free an invalid pointer. The other one is a segmentation fault in _objalloc_alloc. We have submitted those vulnerabilities to their vendors. And these three vulnerabilities found in size have been acknowledged. For the heap-buffer-overflow in avconv, since there is no way to register an account in their bugzilla system and the email to respond to our report email is meaningless, the vendor seems reluctant to fix this vulnerabilities. Moreover, we compare these vulnerabilities with the existing CVEs. The global-buffer-overflow vulnerability in Ncurses-6.1 has been confirmed as CVE-2019-17594 by others.

Program UniFuzz AFL-single-core PAFL EnFuzz
boringssl 15 4 2 13
freetype 0 0 0 0
libcxx 689 80 127 40
libxml 0 0 0 0
re2 3 10 9 1
size 11 7 5 6
readelf 0 0 0 0
avconv 47 4 2 1
tic 264 60 68 34
pdftotext 453 73 33 278
tiff2bw 0 0 0 0
ffmpeg 0 0 0 0
TABLE V: Unique crashes exposed by four tools in 128 units
Program Type Position Status
libcxx Stack-overflow src/cxa_demangle.cpp:1902 N
libcxx Stack-overflow src/cxa_demangle.cpp:1904 N
libcxx Stack-overflow src/cxa_demangle.cpp:4534 N
libcxx Stack-overflow src/cxa_demangle.cpp:4536 N
libcxx Stack-overflow src/cxa_demangle.cpp:378 N
libcxx Stack-overflow src/cxa_demangle.cpp:4211 N
libcxx Stack-overflow src/cxa_demangle.cpp:4214 N
boringssl Double-free boringssl/crypto/asn1/asn1_lib.c:460 N
size free invalid pointer binutils-2.34/bfd/coffgen.c:1782 A
size heap-use-after-free binutils-2.34/bfd/elf.c:2604 A
size SEGV binutils-2.34/bfd/opncls.c:978 A
avconv heap-buffer-overflow libavcodec/h264pred_template.c:632 W
infotocap global-buffer-overflow ncurses/tinfo/comp_hash.c:66:9 C
infotocap heap-buffer-overflow ncurses/tinfo/captoinfo.c:318:12 S
infotocap heap-buffer-overflow ncurses/tinfo/lib_tparm.c:139:20 S
pdftotext SEGV xpdf/Catalog.cc:295 S
  • In the status column of this table, “N” indicates the vulnerability found in the program from Google fuzzer-test-suite has never been reported. “A” means the vulnerability has been acknowledged. “W” means the vendor wouldn’t fix or accepted it. “C” means it has been assigned a CVE number. “S” means the vulnerability has been submitted to the vendor, and we’re waiting for a reply.

TABLE VI: The vulnerabilities found by UniFuzz

V-D Synchronization Overhead

In this subsection, we analyze the synchronization overhead of UniFuzz in fuzzing different programs. To define the synchronization overhead, we classify the states of each node, except the main node and the database node, into fuzzing state and non-fuzzing state according to whether the node is testing a seed. In the non-fuzzing state, the node may request for a new task or evaluate the new seeds. We define the overhead of a fuzzing node as the time duration when this node is in the non-fuzzing state. Then we can calculate the global overhead of UniFuzz based on this. Generally speaking, with the increasing of the parallel scale, the overhead of each node will rise up as the number of new seeds to be evaluated increases rapidly. Therefore, we record the time interval of each node in different states during the entire fuzzing process with 128 units of computation resources. Then we calculate the summation of the time of other nodes in non-fuzzing state. According to this, we report the synchronization overhead in Table VII.

From Table VII, the overhead of UniFuzz on 11/12 programs is under 1%. Particularly, on freetype, avconv, infotocap, pdftotext and ffmpeg, the overhead is no more than 0.1%. It can be concluded that the synchronization overhead of UniFuzz is low. Almost all computation resources are used to mutate seeds and execute test cases in each node.

Program Time in non-fuzzing state(s) Overhead
boringssl 9,268 2.01%
freetype 166 0.04%
libcxx 4,458 0.97%
libxml 915 0.20%
re2 3,406 0.74%
size 1,295 0.28%
readelf 670 0.15%
avconv 48 0.01%
infotocap 375 0.08%
pdftotext 147 0.03%
tiff2bw 1,870 0.41%
ffmpeg 208 0.05%
Average 1,902 0.41%
TABLE VII: Synchronization overhead of UniFuzz in 128 units

Vi Discussion

Vi-a Decreasing the task conflicts to the single-node level

The dynamic scheduling strategy solves the duplicate seed challenges for parallel fuzzing. For duplicate work, as fuzzing tasks are dispatched in a serial manner, each fuzzing instance performs different work—mutating different seeds, which has a low probability of generating conflict test cases. Note that we do not avoid task conflicts completely. In fact, the single-node mode fuzzing is also possible to generate conflict test cases by mutating different seeds. Moreover, it is also possible to generate conflict test cases by randomly mutating the same seed for multiple times. What we achieve is, decreasing the task conflict of parallel fuzzing to the same level as the single-node fuzzing, namely avoids the task conflicts introduced by parallelism. We have proved the effectiveness of reducing duplicate seeds by comparing our work with the single-node mode of AFL in the evaluation section.

Vi-B Workload balance and fault tolerance

Our approach distributes fuzzing tasks in a request-response manner. A fuzzing instance would request a fuzzing task once it has finished a task, unlike the static distribution strategy that is limited by the workload imbalance, poor scalability, and low fault tolerance. Our dynamic dispatching method can easily expand to a large scale and balance the workload automatically without concerning the computing capacity of each fuzzing instance. Besides, removing a fuzzing instance will not affect the operating of the distribution. The workloads would be digested by the remaining instances automatically.

Vi-C Super-linear acceleration

In 2020, Böhme et al. [4] proposed an empirical law in fuzzing that finding the same bugs linearly faster requires linearly more machines, and finding linearly more bugs in the same time requires exponentially more machine. This empirical law was observed based on the assumption that no synchronization cost exists, so that concurrently running fuzzing instances has the same performance as the sum of running each fuzzing instance individually. However, in reality, our experiments show that even take the synchronization cost into account, an optimized parallel fuzzing can achieve a super-linear acceleration, which might threaten the validation of this empirical law. We believe this interesting result can light up the future research of parallel fuzzing.

Vi-D Testing environment influence

During the experiments, an unexpected phenomenon we encounter is that the number of unoccupied computing cores can greatly affect the performance of parallel fuzzing. More specifically, running parallel fuzzing on a machine with spare computing cores performs much better than running on a fully-loaded machine. For example, for certain cases, running with 128 fully-occupied cores performs even worse than using only 64 working cores with 64 spare cores. This is because of the resource contention, including the contention of CPU, memory, syscall, file system access, etc. Some general operations in fuzzing, such as fork() and file read()/write() are not specially designed for parallel computing, which scales worse in a parallel environment and can cause a dramatic performance deduction when the cores increase to a certain number [42]. Böhme et al. also noticed this phenomenon, and they left 20% cores unused to avoid interference in their experiments [4]. Nevertheless, this phenomenon also depends on the characteristic of the PUT, such as the memory access feature. Thus, it is indeterministic. This phenomenon reminds us that in addition to optimizing the parallelism, the testing environment also plays a significant role.

Vi-E Extensible performance

UniFuzz is implemented on top of vanilla AFL, and we concentrate on improving the performance of fuzzing by optimizing the parallel scheme, such as avoiding task conflicts, balancing the workload, and accelerating the synchronization. However, our tool can be further optimized from an orthogonal direction, namely the critical steps in coverage-guided fuzzing. For example, optimizing seed generation [38, 39, 15, 44], mutation strategy [8, 21, 45, 44], seed prioritization [48, 5, 39], etc. We also consider improving the diversity of fuzzing instances in UniFuzz, like EnFuzz, to take advantage of different fuzzers. By combining such optimized techniques, the performance of UniFuzz can be improved further.

As a prototype tool, UniFuzz uses MongoDB as its database, which can be a potential bottleneck when the nodes increase. Although we have run experiments on 128 nodes and haven’t met the limitation of the database, it might emerge when the nodes increase, for example, to 1024 nodes. To alleviate this situation, we can replace the database with Redis [31] or a distributed database. In this way, the performance shall improve further.

In our future work, we consider improving UniFuzz from the above aspects.

Vii Related Work

Algorithm Optimization. Most research improves fuzzing efficiency by designing novel algorithms from several directions such as seed prioritization, the mutation strategy, the application scenario, and combing other techniques. Both AFLFast [28] and Fairfuzz [19] improve fuzzing efficiency by optimizing the method of selecting seeds. The former tends to mutate those seeds with low path frequency, and the latter modifies seeds whose hit count is relatively small. The author of AFLFast also implemented a directed grey-box fuzzer AFLGo [5] towards bug prone locations. In order to mitigate the collision problem of bitmap, Gan et al. introduced CollAFL [14], which provides more accurate coverage information. Moreover, some fuzzers focus on different application scenarios. KAFL [32] was designed to detect vulnerabilities in operating systems. PTfuzz [52] leverages Intel Processor Trace to collect edge coverage information and feeds it back to the fuzzing process. On the other hand, combining with other techniques, the fuzzing efficiency is enhanced. Driller [36], T-Fuzz [27], and QSYM [49] leverage symbolic execution to improve fuzzing performance. VUZZER [29] and Angora [9] utilize taint analysis to gather dynamic and static information of target programs to assist fuzzing. The latter also uses gradient descent to solve path constraints, which brings significant improvement in LAVA-M test [13].

Parallel Optimization Another direction of improving fuzzing efficiency is increasing computing resources to parallelize fuzzing tasks. To enhance the efficiency of symbolic execution, Cloud9 [11] divides the searching scope into several pieces to share workload with each computing core. Liang [23] also solved the challenge of path explosion by sharing results which are got from solving constraint into different computing cores. Xie [41] proposed a parallel framework in 2010, which leverages grid computing for large scale fuzzing. The framework was implemented by storing fuzzing jobs in a server and scheduling remote clients to download these jobs. However, this kind of static scheduling results in an unbalanced partition of workload. The above works focus on scheduling fuzzing tasks to computing resources appropriately, which makes fewer efforts to innovate synchronizing and sharing mechanisms.

To parallelize AFL, AFL-P[50] extends its scalability by utilizing multiple processes to run multiple fuzzer instances to synchronize seeds between these instances. However, the scalability of AFL-P is limited because it is unavailable to access when utilizing computing resources across machines. Roving [30] and disfuzz [3] addressed this problem with the help of the client/server structure. They share new seeds with each computing core in a fixed time interval, which enhances the scalability but produces redundant work and task conflicts, which leads to severe waste of computing resources. Li et al[20] designed a tree structure to store coverage information instead of bitmap. Also, they leverage a polling mechanism to reduce redundant works and avoid conflicts, but this mechanism results in large performance-consuming.

Several recent works focus on partitioning fuzzing tasks to avoid redundant work. In PAFL [24], local guiding information from each fuzzer instance is synchronized with the global guiding information. According to the guiding information, PAFL assigns different task segments divided by grouping branches to different fuzzer instances. PAFL speeds up the fuzzing process, but it cannot run in a distributed system across multiple machines. Another work, which is also called PAFL [43], collects dynamic execution information and dispatches parts of the target programs with weak relationship, which reduces redundant work. However, the weakness of this work is difficult to divide the target program into parts accurately. Finding “key bytes” of input by mutation to reach those deep paths is another contribution of this work, but has high time complexity.

Different from the above works, EnFuzz [10] defines the diversity of fuzzers and chooses several novel fuzzers (AFL, AFLFast, FairFuzz, libfuzzer, radasma, QSYM) based on the diversity standard. It synchronizes seeds and coverage information to the global nodes, thus local nodes sharing interesting seeds with each other. However, EnFuzz only supports sharing guiding information based on file system, and the coverage information of fuzzers is quite different between each other, which limits the scalability of EnFuzz.

For fuzzing based on large scale computing resources, Google proposed Clusterfuzz [12], which is a scalable fuzzing infrastructure. Clusterfuzz supports coverage-based grey-box fuzzers (e.g., LibFuzzer and AFL) and black-box fuzzers. As the fuzzing backend for OSS-Fuzz [33], it has uncovered thousands of vulnerabilities. However, some necessary guiding information is still not utilized by the system. To address the scalability bottlenecks, Wen et al[42] designed new operating primitives to speed up AFL and Libfuzzer by 6.1x to 28.9x and 1.1x to 735.7x with 120 cores. Although this work improves fuzzing performance with underutilized CPU cores, there is still room for improvement by scheduling the fuzzing tasks and extending multi-core parallelism to distributed parallelism. P-fuzz [34] leverages the computing resources of distributed system to enhance fuzzing efficiency. It alleviates task conflicts in part by adopting database-centric architecture. But imbalance still exists, some seeds are overused, while others are idle. On the other hand, a database bottleneck will occur when there are too many nodes that produce a large number of seeds to share.

Viii Conclusion

In this paper, we designed and implemented a distributed fuzzing optimization based on a dynamic centralized task scheduling, called UniFuzz. It has merits such as low task conflicts, low synchronization overhead, no workload imbalance, and scales well in a distributed environment. The experiments on real-world programs show that UniFuzz outperforms state-of-the-art tools such as AFL, PAFL, and EnFuzz. UniFuzz also discovered 16 real-world vulnerabilities. Most importantly, the experiment reveals a counter-intuitive result that parallel fuzzing can achieve a super-linear acceleration to the single-core fuzzing. We made a detailed explanation and proved it with additional experiments.

Acknowledgement

We would like to thank the anonymous reviewers for their valuable comments and helpful suggestions.

References

  • [1] C. M. Ari Takanen (2008) Fuzzing for software security testing and quality assurance. Information Security and Privacy Series, ARTECH HOUSE. External Links: ISBN 13: 978-1-59693-214-2 Cited by: §I.
  • [2] W. Blair, A. Mambretti, S. Arshad, M. Weissbacher, W. Robertson, E. Kirda, and M. Egele (2020) HotFuzz: discovering algorithmic denial-of-service vulnerabilities through guided micro-fuzzing. arXiv preprint arXiv:2002.03416. Cited by: §I.
  • [3] M. Bogaard (2015) Disfuzz-afl. Note: https://github.com/MartijnB/disfuzz-afl Cited by: §I, §I, §II-B, TABLE I, §III-C, §V, §VII.
  • [4] M. Böhme and B. Falk Fuzzing: on the exponential cost of vulnerability discovery. Cited by: §I, §VI-C, §VI-D.
  • [5] M. Böhme, V. Pham, M. Nguyen, and A. Roychoudhury (2017) Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2329–2344. Cited by: §I, §VI-E, §VII.
  • [6] M. Böhme, V. Pham, and A. Roychoudhury (2017)

    Coverage-based greybox fuzzing as markov chain

    .
    IEEE Transactions on Software Engineering 45 (5), pp. 489–506. Cited by: §II-A.
  • [7] M. Böhme (2016) AFLFast.new. Note: https://groups.google.com/d/msg/afl-users/ 1PmKJC-EKZ0/lbzRb8AuAAAJ Cited by: §I.
  • [8] H. Chen, Y. Xue, Y. Li, B. Chen, X. Xie, X. Wu, and Y. Liu (2018) Hawkeye: towards a desired directed grey-box fuzzer. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 2095–2108. Cited by: §I, §VI-E.
  • [9] P. Chen and H. Chen (2018) Angora: efficient fuzzing by principled search. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 711–725. Cited by: §VII.
  • [10] Y. Chen, Y. Jiang, F. Ma, J. Liang, M. Wang, C. Zhou, Z. Su, and X. Jiao EnFuzz: ensemble fuzzing with seed synchronization among diverse fuzzers. Cited by: §I, §I, §II-B, TABLE I, §III-C, §V, §VII.
  • [11] L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Candea Cloud9: a software testing service. Acm Sigops Operating Systems Review 43 (4), pp. 5–10. Cited by: §VII.
  • [12] (2016) Clusterfuzz. Note: https://google.github.io/clusterfuzz/ Cited by: TABLE I, §V, §VII.
  • [13] B. Dolan-Gavitt, P. Hulin, E. Kirda, T. Leek, A. Mambretti, W. Robertson, F. Ulrich, and R. Whelan (2016) Lava: large-scale automated vulnerability addition. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 110–121. Cited by: §VII.
  • [14] S. Gan, C. Zhang, X. Qin, X. Tu, K. Li, Z. Pei, and Z. Chen (2018-05) CollAFL: path sensitive fuzzing. pp. 679–696. External Links: Document Cited by: §VII.
  • [15] V. Jain, S. Rawat, C. Giuffrida, and H. Bos (2018) TIFF: using input type inference to improve fuzzing. In Proceedings of the 34th Annual Computer Security Applications Conference, pp. 505–517. Cited by: §I, §VI-E.
  • [16] D. R. Jeong, K. Kim, B. Shivakumar, B. Lee, and I. Shin (2019) Razzer: finding kernel race bugs through fuzzing. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 754–768. Cited by: §I.
  • [17] B. Jiang, Y. Liu, and W. Chan (2018) Contractfuzzer: fuzzing smart contracts for vulnerability detection. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 259–269. Cited by: §I.
  • [18] K. Kim, D. R. Jeong, C. H. Kim, Y. Jang, I. Shin, and B. Lee HFL: hybrid fuzzing on the linux kernel. Cited by: §I.
  • [19] C. Lemieux and K. Sen (2018) Fairfuzz: a targeted mutation strategy for increasing greybox fuzz testing coverage. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 475–485. Cited by: §I, §II-B, §VII.
  • [20] Y. Li, C. Feng, and C. Tang (2018) A large-scale parallel fuzzing system. In Proceedings of the 2nd International Conference on Advances in Image Processing, pp. 194–197. Cited by: §VII.
  • [21] Y. Li, S. Ji, C. Lv, Y. Chen, J. Chen, Q. Gu, and C. Wu (2019) V-fuzz: vulnerability-oriented evolutionary fuzzing. arXiv preprint arXiv:1901.01142. Cited by: §I, §VI-E.
  • [22] H. Liang, X. Pei, X. Jia, W. Shen, and J. Zhang (2018-Sep.) Fuzzing: state of the art. IEEE Transactions on Reliability 67 (3), pp. 1199–1218. External Links: Document, ISSN 1558-1721 Cited by: §I.
  • [23] H. LIANG, Y. Xiaoyu, D. Yu, P. ZHANG, and L. Shuchang (2015) Parallel smart fuzzing test. Journal of Tsinghua University (Science and Technology) 54 (1), pp. 14–19. Cited by: §VII.
  • [24] J. Liang, Y. Jiang, Y. Chen, M. Wang, C. Zhou, and J. Sun (2018) Pafl: extend fuzzing optimizations of single mode to industrial parallel mode. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 809–814. Cited by: §I, §I, §I, §II-B, §II-B, §II-B, TABLE I, §V, §VII.
  • [25] C. Lyu, S. Ji, C. Zhang, Y. Li, W. Lee, Y. Song, and R. Beyah (2019) mopt: Optimized mutation scheduling for fuzzers. In 28th USENIX Security Symposium (USENIX Security 19), pp. 1949–1966. Cited by: §II-A.
  • [26] V. Manes, H. Han, C. Han, S. Cha, M. Egele, E. Schwartz, and M. Woo (2018-11) Fuzzing: art, science, and engineering. pp. . Cited by: §I.
  • [27] H. Peng, Y. Shoshitaishvili, and M. Payer (2018) T-fuzz: fuzzing by program transformation. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 697–710. Cited by: §VII.
  • [28] V. T. Pham and A. Roychoudhury (2016) Coverage-based greybox fuzzing as markov chain. Cited by: §II-B, §VII.
  • [29] S. Rawat, V. Jain, A. Kumar, L. Cojocar, C. Giuffrida, and H. Bos (2017) VUzzer: application-aware evolutionary fuzzing.. In NDSS, Vol. 17, pp. 1–14. Cited by: §VII.
  • [30] richö butts (2015) Roving. Note: https://github.com/richo/Roving Cited by: §I, §I, §II-B, TABLE I, §III-C, §V, §VII.
  • [31] S. Sanfilippo Redis. Note: https://redis.io/ Cited by: §VI-E.
  • [32] S. Schumilo, C. Aschermann, R. Gawlik, S. Schinzel, and T. Holz (2017) Kafl: hardware-assisted feedback fuzzing for os kernels. In 26th USENIX Security Symposium (USENIX Security 17), pp. 167–182. Cited by: §I, §VII.
  • [33] K. Serebryany (2017-08) OSS-fuzz - google’s continuous fuzzing service for open source software. Vancouver, BC. Cited by: §VII.
  • [34] C. Song, X. Zhou, Q. Yin, X. He, H. Zhang, and K. Lu (2019-11) P-fuzz: a parallel grey-box fuzzing framework. Applied Sciences 9, pp. 5100. External Links: Document Cited by: §I, §I, §I, §II-B, TABLE I, §III-C, §VII.
  • [35] D. Song, F. Hetzelt, D. Das, C. Spensky, Y. Na, S. Volckaert, G. Vigna, C. Kruegel, J. Seifert, and M. Franz (2019) PeriScope: an effective probing and fuzzing framework for the hardware-os boundary.. In NDSS, Cited by: §I.
  • [36] N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang, J. Corbetta, Y. Shoshitaishvili, C. Kruegel, and G. Vigna (2016) Driller: augmenting fuzzing through selective symbolic execution.. In NDSS, Vol. 16, pp. 1–16. Cited by: §VII.
  • [37] N. Vinesh, S. Rawat, H. Bos, C. Giuffrida, and M. Sethumadhavan (2020) ConFuzz—a concurrency fuzzer. In First International Conference on Sustainable Technologies for Computational Intelligence, pp. 667–691. Cited by: §I.
  • [38] W. Wang, H. Sun, and Q. Zeng (2016) Seededfuzz: selecting and generating seeds for directed fuzzing. In 2016 10th International Symposium on Theoretical Aspects of Software Engineering (TASE), pp. 49–56. Cited by: §I, §VI-E.
  • [39] Z. Wang, B. Liblit, and T. Reps (2020) TOFU: target-orienter fuzzer. arXiv preprint arXiv:2004.14375. Cited by: §I, §VI-E.
  • [40] V. Wüstholz and M. Christakis (2019) Targeted greybox fuzzing with static lookahead analysis. arXiv preprint arXiv:1905.07147. Cited by: §I.
  • [41] Y. Xie (2010) Using grid computing for large scale fuzzing. Ph.D. Thesis. Cited by: §VII.
  • [42] W. Xu, S. Kashyap, C. Min, and T. Kim (2017) Designing new operating primitives to improve fuzzing performance. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, New York, NY, USA, pp. 2313–2328. External Links: ISBN 9781450349468, Link, Document Cited by: §I, §III-C, §VI-D, §VII.
  • [43] J. Ye, B. Zhang, R. Li, C. Feng, and C. Tang (2019) Program state sensitive parallel fuzzing for real world software. IEEE Access 7, pp. 42557–42564. Cited by: TABLE I, §III-C, §VII.
  • [44] W. You, X. Wang, S. Ma, J. Huang, X. Zhang, X. Wang, and B. Liang (2019) Profuzzer: on-the-fly input type probing for better zero-day vulnerability discovery. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 769–786. Cited by: §I, §VI-E.
  • [45] W. You, P. Zong, K. Chen, X. Wang, X. Liao, P. Bian, and B. Liang (2017) Semfuzz: semantics-based automatic generation of proof-of-concept exploits. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2139–2154. Cited by: §I, §VI-E.
  • [46] B. Yu, P. Wang, T. Yue, and Y. Tang (2019) Poster: fuzzing iot firmware via multi-stage message generation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 2525–2527. Cited by: §I.
  • [47] T. Yue, Y. Tang, B. Yu, P. Wang, and E. Wang (2019) LearnAFL: greybox fuzzing with knowledge enhancement. IEEE Access 7, pp. 117029–117043. Cited by: §II-A.
  • [48] T. Yue, P. Wang, Y. Tang, E. Wang, B. Yu, K. Lu, and X. Zhou (2020) EcoFuzz: adaptive energy-saving greybox fuzzing as a variant of the adversarial multi-armed bandit. In 29th USENIX Security Symposium (USENIX Security 20), Cited by: §I, §II-A, §VI-E.
  • [49] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim (2018) qsym: A practical concolic execution engine tailored for hybrid fuzzing. In 27th USENIX Security Symposium (USENIX Security 18), pp. 745–761. Cited by: §VII.
  • [50] M. Zalewski (2015) American fuzzy lop. Note: http://lcamtuf.coredump.cx/afl/ Cited by: §II-A, TABLE I, §V, §VII.
  • [51] M. Zalewski (2016) FidgetyAFL. Note: https://groups.google.com/d/msg/afl-users/fOPeb62FZUg/CES5lhznDgAJ/ Cited by: §I.
  • [52] G. Zhang, X. Zhou, Y. Luo, X. Wu, and E. Min (2018) PTfuzz: guided fuzzing with processor trace feedback. IEEE Access 6, pp. 37302–37313. Cited by: §VII.
  • [53] M. X. S. K. H. Zhao and T. Kim KRACE: data race fuzzing for kernel file systems. Cited by: §I.
  • [54] Y. Zheng, A. Davanian, H. Yin, C. Song, H. Zhu, and L. Sun (2019) FIRM-afl: high-throughput greybox fuzzing of iot firmware via augmented process emulation. In 28th USENIX Security Symposium (USENIX Security 19), pp. 1099–1114. Cited by: §I.