UNIFUZZ: A Holistic and Pragmatic Metrics-Driven Platform for Evaluating Fuzzers

10/05/2020
by   Yuwei Li, et al.
0

A flurry of fuzzing tools (fuzzers) have been proposed in the literature, aiming at detecting software vulnerabilities effectively and efficiently. To date, it is however still challenging to compare fuzzers due to the inconsistency of the benchmarks, performance metrics, and/or environments for evaluation, which buries the useful insights and thus impedes the discovery of promising fuzzing primitives. In this paper, we design and develop UNIFUZZ, an open-source and metrics-driven platform for assessing fuzzers in a comprehensive and quantitative manner. Specifically, UNIFUZZ to date has incorporated 35 usable fuzzers, a benchmark of 20 real-world programs, and six categories of performance metrics. We first systematically study the usability of existing fuzzers, find and fix a number of flaws, and integrate them into UNIFUZZ. Based on the study, we propose a collection of pragmatic performance metrics to evaluate fuzzers from six complementary perspectives. Using UNIFUZZ, we conduct in-depth evaluations of several prominent fuzzers including AFL [1], AFLFast [2], Angora [3], Honggfuzz [4], MOPT [5], QSYM [6], T-Fuzz [7] and VUzzer64 [8]. We find that none of them outperforms the others across all the target programs, and that using a single metric to assess the performance of a fuzzer may lead to unilateral conclusions, which demonstrates the significance of comprehensive metrics. Moreover, we identify and investigate previously overlooked factors that may significantly affect a fuzzer's performance, including instrumentation methods and crash analysis tools. Our empirical results show that they are critical to the evaluation of a fuzzer. We hope that our findings can shed light on reliable fuzzing evaluation, so that we can discover promising fuzzing primitives to effectively facilitate fuzzer designs in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2019

Single Image Deraining: A Comprehensive Benchmark Analysis

We present a comprehensive study and evaluation of existing single image...
research
09/12/2017

Image Matching: An Application-oriented Benchmark

Image matching approaches have been widely used in computer vision appli...
research
07/04/2023

DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection

A critical yet frequently overlooked challenge in the field of deepfake ...
research
12/23/2022

Detecting Exploit Primitives Automatically for Heap Vulnerabilities on Binary Programs

Automated Exploit Generation (AEG) is a well-known difficult task, espec...
research
12/12/2017

RESIDE: A Benchmark for Single Image Dehazing

In this paper, we present a comprehensive study and evaluation of existi...
research
07/22/2023

A Quantitative Analysis of Open Source Software Code Quality: Insights from Metric Distributions

Code quality is a crucial construct in open-source software (OSS) with t...
research
08/11/2020

Common Metrics to Benchmark Human-Machine Teams (HMT): A Review

A significant amount of work is invested in human-machine teaming (HMT) ...

Please sign up or login with your details

Forgot password? Click here to reset