FMViz: Visualizing Tests Generated by AFL at the Byte-level

12/25/2021
by   Aftab Hussain, et al.
github
University of Houston
0

Software fuzzing is a strong testing technique that has become the de facto approach for automated software testing and software vulnerability detection in the industry. The random nature of fuzzing makes monitoring and understanding the behavior of fuzzers difficult. In this paper, we report the development of Fuzzer Mutation Visualizer (FMViz), a tool that focuses on visualizing byte-level mutations in fuzzers. In particular, FMViz extends American Fuzzy Lop (AFL) to visualize the generated test inputs and highlight changes between consecutively generated seeds as a fuzzing campaign progresses. The overarching goal of our tool is to help developers and students comprehend the inner-workings of the AFL fuzzer better. In this paper, we present the architecture of FMViz, discuss a sample case study of it, and outline the future work. FMViz is open-source and publicly available at https://github.com/AftabHussain/afl-test-viz.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 6

page 10

03/23/2022

Methods2Test: A dataset of focal methods mapped to test cases

Unit testing is an essential part of the software development process, w...
09/18/2017

TikZ-network manual

TikZ-network is an open source software project for visualizing graphs a...
07/12/2019

Modularization of Research Software for Collaborative Open Source Development

Software systems evolve over their lifetime. Changing conditions, such a...
03/08/2019

RESTORE: Automated Regression Testing for Datasets

In data mining, the data in various business cases (e.g., sales, marketi...
06/12/2021

Lessons learned from hyper-parameter tuning for microservice candidate identification

When optimizing software for the cloud, monolithic applications need to ...
05/04/2021

Interactive Static Software Performance Analysis in the IDE

Detecting performance issues due to suboptimal code during the developme...
10/26/2020

What It Would Take to Use Mutation Testing in Industry–A Study at Facebook

Traditionally, mutation testing generates an abundance of small deviatio...

Code Repositories

afl-test-viz

Visualizing tests generated in AFL during fuzzing


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Fuzzing has become a widely popular tool for testing programs in the software industry. The overall simplicity of the design of coverage-guided fuzzers (e.g., AFL [21]) – generating test inputs by mutating other test inputs in a pseudo-random fashion while optimizing for code coverage in the test subject, and executing them on test subjects at scale – has been very effective in finding bugs and vulnerabilities in software systems. Large companies have integrated fuzzers in their testing ecosystem; for instance, Google is continuously running fuzzers on its Chrome browser to find vulnerabilities.

There has been significant research in building efficient fuzzers that can generate interesting test inputs faster in a fuzzing campaign: e.g., grammar-based fuzzing [16, 8], data-flow techniques [15, 14], stochastic scheduling methods [13], smarter test selection methods [10], etc. Nevertheless, there is one theme that all fuzzers have in common, albeit in varying degrees: randomness, which contributes to the “black-box” nature of their operation.

This randomness of fuzzers poses a challenge for developers to understand and interpret what operations are being carried out on which test inputs, and reason about the behavior of the fuzzers. While fuzzers, like AFL, do offer high-level statistics on what operations are being performed, the information is shown in a hard-to-follow111The stats are real-time and constantly-changing. descriptive manner. They also store data for coverage-increasing test inputs only, and provide no support for understanding how the other tests were generated (which may contain useful information). In addition, the real-time statistics do not portray which test inputs are being changed (i.e. mutated). Furthermore, the initial choice of test inputs in a fuzzing campaign can considerably influence its progress [10]. We thus believe it is necessary to have an approach for understanding how the test inputs are being mutated and address the problem from a visualization angle.

The importance of software visualization (SV) cannot be over-emphasized. The superiority of visual memory in cognition is discussed in Diehl’s seminal work [5] where it was mentioned that 75% of all information from the real world is visually perceived, 13% through auditory senses, and the rest is perceived through other senses. Despite its value, SV has a huge potential to be realized in software engineering [4] and, to the best of our knowledge, even more so in fuzzing (we discuss a few existing research we found in Section 2). Towards the idea of bringing better visualization in fuzzing, we build a visualization approach for the integral component of almost all state-of-the-art fuzzers: mutation. Our tool, FMViz, helps us see which bytes of a test input undergo mutations during an AFL fuzzing campaign, and thereby makes mutation patterns in fuzzing more perceivable. The tool is light-weight and easy to extend to other fuzzers. We believe this work is a stepping stone in the direction of inspecting the behavior of mutational fuzzers on various test inputs, at a deeper-level.

Contributions. The main contributions of this paper are as follows:

  • We provide a light-weight approach to visualize fuzzing mutation behavior in AFL by visualizing test inputs generated during fuzzing and highlighting the changes.

  • We instantiate the approach in FMViz by capturing and displaying mutation locations in test inputs that undergo mutation in the fuzzing process – observing series of FMViz output images help in seeing various mutation patterns that take place during fuzzing.

  • We present a short demonstration of FMViz with an AFL fuzzing process on libxml2, where we present some mutation patterns captured by the tool.

Paper Organization. In Section 2, we present some related literature. In Section 3, we provide details of the architecture of FMViz. In Section 4, we illustrate the results of a demo of FMViz in fuzzing libxml2. We conclude our paper in Section 5, providing future directions.

2 Related Work

Information visualization has been widely used in different realms of software engineering including bug analysis [20], evolution [7], refactoring [11, 1] – it makes it easier for developers to understand, analyze, and deploy various software engineering tasks. There are a few works that have adopted visualization techniques in the fuzzing domain. For example, VisFuzz [22], an LLVM plugin that works on top of a modified version of AFL, is an interactive real-time visualization tool that visualizes constraints in the fuzz subject by extracting a call graph and a control flow graph from the subject code. FuzzSplore [6] provides statistical visualizations such as a coverage plot, which shows the number of edges that are covered over time by test inputs, and a plot that shows the number of interesting test inputs generated over the campaign. Vainio [18] provides a fuzzing visualization framework that adopts information visualization techniques (e.g. circle packing) to view fuzzing performance data such as CPU and memory usage statistics and power consumption. In [3], the coverage of a test subject’s call graph is visualized when fuzzed by Hongfuzz [9] and AFL. Unlike our tool, FMViz, none of these works delve into visualizing mutations in fuzzers; FMViz generates visuals of how the mutations are occurring on the test inputs at the byte-level.

3 Tool Architecture

In this section, we provide an overview of the architecture of FMViz. We also present information on its usage and performance. The implementation and documentation of FMViz is available in Github222https://github.com/AftabHussain/afl-test-viz.

Figure 1 depicts the architecture of FMViz and its main components, which are: Test Input Color Representation Generator and Test Input Image Generator.

Figure 1: Overview of FMViz. Figure 2: Cropped sample image of a test input.

Figure 2 shows a sample visualization output of FMViz of a test input.

3.1 Extracting Color Representations of Test Inputs

FMViz extends AFL to capture the byte stream representations of new test inputs that are generated as AFL mutates original seeds (Figure 3(a)). FMViz saves these representations in a single file (a dump file in hexadecimal). Each byte’s hex code is chosen to represent a shade of red, depending on the value stored in the byte (we elaborate on the color representation in the following subsection). Each line of this file corresponds to the representation of a single test input.

3.2 Test Input Image Generator

This piece of our tool (Fig. 3(b)), written in Python, reads the file generated by the color representation generator, line by line, and generates PNG image files (where each line corresponds to a test input as mentioned previously). In the image output, each box represents a byte of a test input. For obtaining the box colors we use the six-digit hex triplet, a three-byte hexadecimal number, which is typically used for various computing applications, e.g. HTML, SVG, etc., to generate colors. Each of these three bytes show the red, green, and blue components of the color respectively [19]. The box color representation for each byte of the test input is evaluated as follows: byte is translated to the hex color code , where is a hex representation of a test input byte, where and each belong to the set of 16 hexadecimal symbols (,

). The box dimensions can be changed to vary the number of test input bytes to display in the image. The PNG files can be used independently to represent individual test inputs, or can be used to generate a time-lapse video of the evolution of the test inputs using a linear image interpolator 

[17], or a screen recorder [2].

Figure 3: Workflow of the FMViz Tool. (a) Test Input Color Representation Generation. (b) Test Input Image Generation

3.3 Usage and Performance Considerations

The present implementation is adapted for an AFL fuzzing campaign with a single test input. Although the overhead of writing to the color dump file is minimal during the fuzzing campaign, since a single file is used, the file can get very large over long fuzzing periods. We are considering to extend FMViz to optimize the storage use through using a more compressed representation of the test inputs.

4 Demonstration: Fuzzing libxml2

In this section, we present a short demo of FMViz. The purpose of this demo is to show how the mutation locations in a test input are visually captured.

Figure 4: Three observed visual patterns of how an XML test input is mutated during fuzzing libxml2 with AFL. (a) 2-byte, shifting mutations (test inputs 7575 to 7581), (b) 4-byte, shifting mutations (test inputs 7851 to 7857), (c) single-byte fixed mutations (test inputs 9358 to 9364) – yellow arrow added for illustrating changing byte.

4.1 Steps Performed

We applied FMViz’s representation generator on top of AFL to fuzz the XML C parser library, libxml2333We used the version, https://github.com/GNOME/libxml2.git – commit id. 1fbcf40[12] for seconds. This step produced test inputs and a dump file containing color representations of each of those test inputs. Next, we used FMViz’s image generator to parse the dump file and generate images for each test. The image generation process took slightly over five minutes for all tests. Then viewing a series of consecutive test input color matrix images (in PNG format), using the default system image viewer, revealed patterns (we also produced a time-lapse video from the sequence of images and saved them in a video file, which is available in the repository.). For this experiment, we used a computer system with Intel(R) 1.90GHz Xeon(R) CPU and 64 GB RAM with Ubuntu 18.04.5 LTS.

4.2 Mutation Patterns Observed

Figure 4 depicts the mutation patterns that we observed. For visualizing each pattern, seven consecutively generated tests are shown.

2-byte, shifting mutation pattern (Figure 4(a)). Here, in every mutation iteration, the fuzzer mutates a pair of bytes of the test input. This pair-mutation operation progresses by shifting by one byte in the next iteration.

4-byte, shifting mutation pattern (Figure 4(b)). Here, in every mutation iteration, the fuzzer mutates a set of four bytes of the test input. The 4-byte-mutation operation progresses by shifting by one byte in the next iteration.

Single-byte, fixed mutation pattern (Figure 4(c)). Here, in every mutation iteration, the fuzzer mutates the same byte. The changing byte in the figure is shown with the yellow arrow.

5 Conclusion and Future Work

In this work, building on the motivation of software visualization, we presented an easy-to-extend, light-weight visualization tool, FMViz, that helps us better perceive the mutation process in the AFL fuzzer. In particular, we visualize bytes of a test input that undergo mutation during fuzzing. FMViz encodes bytes of test inputs as colors and mutations are captured by changes in the colors. In the next steps, we plan to augment the visual representation of test inputs by other information such as coverage. We also plan to explore more efficient ways to store representations of test inputs. Furthermore, we plan to evaluate the usefulness of FMViz and similar visualization tools in teaching software testing to undergraduate students.

References

  • [1] A. Alkhalid, M. Alshayeb, and S. Mahmoud (2010) Software refactoring at the function level using new adaptive k-nearest neighbor algorithm. Adv. Eng. Softw. 41 (10–11), pp. 1160–1178. Cited by: §2.
  • [2] M. Baert SimpleScreenRecorder. Note: visited on 2021-12-25 External Links: Link Cited by: Generating Images from Color Representations of Test Inputs, §3.2.
  • [3] M. Böhme Visualizing fuzzer progress. Note: visited on 2021-12-25 External Links: Link Cited by: §2.
  • [4] S. Diehl (2015) Past, present, and future in and of software visualization. In

    The International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.

    ,
    pp. 3–11. Cited by: §1.
  • [5] S. Diehl (2015) Past, present, and future of and in software visualization. In Computer Vision, Imaging and Computer Graphics - Theory and Applications, S. Battiato, S. Coquillart, J. Pettré, R. S. Laramee, A. Kerren, and J. Braz (Eds.), Cham, pp. 3–11. Cited by: §1.
  • [6] A. Fioraldi and L. P. Pileggi (2021) FuzzSplore: visualizing feedback-driven fuzzing techniques. arXiv preprint arXiv:2102.02527. Cited by: §2.
  • [7] H. C. Gall and M. Lanza (2006) Software evolution: analysis and visualization. In Proceedings of the 28th International Conference on Software Engineering, ICSE ’06, pp. 1055–1056. Cited by: §2.
  • [8] P. Godefroid, A. Kiezun, and M. Y. Levin (2008) Grammar-based whitebox fuzzing. SIGPLAN Not. 43 (6), pp. 206–215. External Links: ISSN 0362-1340 Cited by: §1.
  • [9] Google Honggfuzz. Note: visited on 2021-12-25 External Links: Link Cited by: §2.
  • [10] A. Herrera, H. Gunadi, S. Magrath, M. Norrish, M. Payer, and A. L. Hosking (2021) Seed selection for successful fuzzing. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021, pp. 230–243. Cited by: §1, §1.
  • [11] A. Hussain and Md. S. Rahman (2013)

    A new hierarchical clustering technique for restructuring software at the function level

    .
    In Proceedings of the 6th India Software Engineering Conference, ISEC ’13, pp. 45–54. Cited by: §2.
  • [12] Libxml2. Note: visited on 2021-12-25 External Links: Link Cited by: §4.1.
  • [13] C. Lyu, S. Ji, C. Zhang, Y. Li, W. Lee, Y. Song, and R. Beyah (2019) MOPT: optimized mutation scheduling for fuzzers. In 28th USENIX Security Symposium (USENIX Security 19), pp. 1949–1966. Cited by: §1.
  • [14] B. Mathis, R. Gopinath, M. Mera, A. Kampmann, M. Höschele, and A. Zeller (2019) Parser-directed fuzzing. PLDI 2019. Cited by: §1.
  • [15] B. Mathis, R. Gopinath, and A. Zeller (2020) Learning input tokens for effective fuzzing. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2020, pp. 27–37. Cited by: §1.
  • [16] P. Srivastava and M. Payer (2021) Gramatron: effective grammar-aware fuzzing. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021, pp. 244–256. Cited by: §1.
  • [17] Timelapse. Note: visited on 2021-12-25 External Links: Link Cited by: §3.2.
  • [18] J. Vainio (2014)

    The use of data visualization in fuzz test monitoring

    .
    Master’s Thesis, University of Oulu. Cited by: §2.
  • [19] Wikipedia Hex triplet. External Links: Link Cited by: §3.2.
  • [20] S. Yeasmin, C. K. Roy, and K. A. Schneider (2014) Interactive visualization of bug reports using topic evolution and extractive summaries. In 2014 IEEE International Conference on Software Maintenance and Evolution, Vol. , pp. 421–425. Cited by: §2.
  • [21] M. Zalewski (2017) American fuzzy lop. Note: visited on 2021-12-25 External Links: Link Cited by: §1.
  • [22] C. Zhou, M. Wang, J. Liang, Z. Liu, C. Sun, and Y. Jiang (2019) VisFuzz: understanding and intervening fuzzing with interactive visualization. 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1078–1081. Cited by: §2.

A. Using FMViz

In this section, we describe the steps to run the first release of FMViz, version 1.0. We use libxml2 as our fuzzing test subject. All commands provided next are for the Linux environment.

Setting up the Environment

1. FMViz Setup

In any directory, we clone the FMViz repository as follows444Currently, the name of the repository has been purposely kept different from FMViz.:

git clone --recursive git@github.com:AftabHussain/afl-test-viz.git

Then we build and install the AFL fuzzer, patched with FMViz’s Test Input Color Representation Generator component, by performing the following command:

cd afl-test-viz/code/AFL-mut-viz/AFL && make -j32 && make install

2. libxml2 Setup

Once we have setup the fuzzer, we build the test subject (libxml2) with AFL’s compiler (afl-gcc), which prepares libxml2 binaries as fuzzing targets. We first obtain libxml2 as follows in a folder outside afl-test-viz directory:

git clone https://github.com/GNOME/libxml2.git && cd libxml2 && git checkout 1fbcf40

Finally, we configure and build libxml2 by performing the following command:

cd libxml2 && export CC=afl-gcc && ./autogen.sh && make -j32

Generating Color Representations of Test Inputs

We now invoke the first part of FMViz, the augmented AFL fuzzer, which produces color representations (in hex) of test inputs generated while fuzzing the test subject. In this demo, we fuzz the libxml2 binary, xmllint. We thus enter the libxml2 folder, create an input folder (input), and place in it any XML file as a test input (some sample inputs are available in the FMViz repository):

cd libxml2 && mkdir input && cp [path_to_xml_file] input/

Thereafter, we invoke the fuzzer as follows:

export AFL_SKIP_CPUFREQ=1 && export LD_LIBRARY_PATH=./.libs/ &&
afl-fuzz -i input/ -o output/ -- ./.libs/xmllint -o /dev/null @@

The fuzzing process can be terminated anytime using Ctrl+C – on termination all results are saved in the output folder, output. Inside this folder, the color dump file tests_generated contains color representations of all the tests created by the fuzzer.

Generating Images from Color Representations of Test Inputs

To generate test input images, we process the color dump file obtained in the previous phase. We place this file along with the Image Generation program (viz_tests.py) in a separate directory:

mkdir process_color_rep

cp libxml2/output/tests_generated process_color_rep/

cp afl-test-viz/code/viz_tests.py process_color_rep/

Finally we invoke the script:

cd process_color_rep/ && python viz_tests.py

The above command generates PNG images for all tests, that are represented in the color dump file, in process_color_rep directory:

ls | xargs -n 1

.
.
.
file_000005564.png
file_000005565.png
file_000005566.png
file_000005567.png
file_000005568.png
file_000005569.png
file_000005570.png
file_000005571.png
file_000005572.png
.
.
Figure 5: Screenshot of a test input image on Image Viewer

A sample screenshot of a test input image, opened with Image Viewer, a default image viewer in Ubuntu, is shown in Figure 5. Since the image files for the input tests are named in the order in which they were produced during fuzzing, toggling over consecutive images in the image viewer application shows the trends in mutations. In order to produce a time-lapse video, we use Simple Screen Recorder [2], which once installed can be invoked by the command simplescreenrecorder on the terminal. Then by starting recording and toggling over multiple images on Image Viewer by holding the left/right arrow key, we are able to record the mutation transitions that take place.