Test Suites as a Source of Training Data for Static Analysis Alert Classifiers

05/07/2021
by   Lori Flynn, et al.
0

Flaw-finding static analysis tools typically generate large volumes of code flaw alerts including many false positives. To save on human effort to triage these alerts, a significant body of work attempts to use machine learning to classify and prioritize alerts. Identifying a useful set of training data, however, remains a fundamental challenge in developing such classifiers in many contexts. We propose using static analysis test suites (i.e., repositories of "benchmark" programs that are purpose-built to test coverage and precision of static analysis tools) as a novel source of training data. In a case study, we generated a large quantity of alerts by executing various static analyzers on the Juliet C/C++ test suite, and we automatically derived ground truth labels for these alerts by referencing the Juliet test suite metadata. Finally, we used this data to train classifiers to predict whether an alert is a false positive. Our classifiers obtained high precision (90.2 for a large number of code flaw types on a hold-out test set. This preliminary result suggests that pre-training classifiers on test suite data could help to jumpstart static analysis alert classification in data-limited contexts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2021

Ranking Warnings of Static Analysis Tools Using Representation Learning

Static analysis tools are frequently used to detect potential vulnerabil...
research
02/12/2022

Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?

Automatic static analysis tools (ASATs), such as Findbugs, have a high f...
research
08/29/2018

Towards security defect prediction with AI

In this study, we investigate the limits of the current state of the art...
research
05/21/2022

How to Find Actionable Static Analysis Warnings

Automatically generated static code warnings suffer from a large number ...
research
06/01/2021

The Care Label Concept: A Certification Suite for Trustworthy and Resource-Aware Machine Learning

Machine learning applications have become ubiquitous. This has led to an...
research
01/06/2023

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

With tools like GitHub Copilot, automatic code suggestion is no longer a...
research
09/07/2022

AutoPruner: Transformer-Based Call Graph Pruning

Constructing a static call graph requires trade-offs between soundness a...

Please sign up or login with your details

Forgot password? Click here to reset