Token-Level Fuzzing

04/04/2023
by   Christopher Salls, et al.
0

Fuzzing has become a commonly used approach to identifying bugs in complex, real-world programs. However, interpreters are notoriously difficult to fuzz effectively, as they expect highly structured inputs, which are rarely produced by most fuzzing mutations. For this class of programs, grammar-based fuzzing has been shown to be effective. Tools based on this approach can find bugs in the code that is executed after parsing the interpreter inputs, by following language-specific rules when generating and mutating test cases. Unfortunately, grammar-based fuzzing is often unable to discover subtle bugs associated with the parsing and handling of the language syntax. Additionally, if the grammar provided to the fuzzer is incomplete, or does not match the implementation completely, the fuzzer will fail to exercise important parts of the available functionality. In this paper, we propose a new fuzzing technique, called Token-Level Fuzzing. Instead of applying mutations either at the byte level or at the grammar level, Token-Level Fuzzing applies mutations at the token level. Evolutionary fuzzers can leverage this technique to both generate inputs that are parsed successfully and generate inputs that do not conform strictly to the grammar. As a result, the proposed approach can find bugs that neither byte-level fuzzing nor grammar-based fuzzing can find. We evaluated Token-Level Fuzzing by modifying AFL and fuzzing four popular JavaScript engines, finding 29 previously unknown bugs, several of which could not be found with state-of-the-art byte-level and grammar-based fuzzers.

READ FULL TEXT
research
12/04/2018

Superion: Grammar-Aware Greybox Fuzzing

In recent years, coverage-based greybox fuzzing has proven itself to be ...
research
11/18/2019

Building Fast Fuzzers

Fuzzing is one of the key techniques for evaluating the robustness of pr...
research
10/18/2018

Sample-Free Learning of Input Grammars for Comprehensive Software Fuzzing

Generating valid test inputs for a program is much easier if one knows t...
research
03/27/2018

Towards Zero-Overhead Disambiguation of Deep Priority Conflicts

**Context** Context-free grammars are widely used for language prototypi...
research
06/04/2020

SMIE: Weakness is Power!: Auto-indentation with incomplete information

Automatic indentation of source code is fundamentally a simple matter of...
research
11/02/2019

WEIZZ: Automatic Grey-box Fuzzing for Structured Binary Formats

Fuzzing technologies have evolved at a fast pace in recent years, reveal...
research
08/03/2020

Evolutionary Grammar-Based Fuzzing

A fuzzer provides randomly generated inputs to a targeted software to ex...

Please sign up or login with your details

Forgot password? Click here to reset