STADS: Software Testing as Species Discovery

03/06/2018
by   Marcel Böhme, et al.
0

A fundamental challenge of software testing is the statistically well-grounded extrapolation from program behaviors observed during testing. For instance, a security researcher who has run the fuzzer for a week has currently no means (i) to estimate the total number of feasible program branches, given that only a fraction has been covered so far, (ii) to estimate the additional time required to cover 10 that a vulnerability exists when no vulnerability has been discovered. Failing to discover a vulnerability, does not mean that none exists---even if the fuzzer was run for a week (or a year). Hence, testing provides no formal correctness guarantees. In this article, I establish an unexpected connection with the otherwise unrelated scientific field of ecology, and introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS). For instance, in order to study the species diversity of arthropods in a tropical rain forest, ecologists would first sample a large number of individuals from that forest, determine their species, and extrapolate from the properties observed in the sample to properties of the whole forest. The estimation (i) of the total number of species, (ii) of the additional sampling effort required to discover 10 species are classical problems in ecology. The STADS framework draws from over three decades of research in ecological biostatistics to address the fundamental extrapolation challenge for automated test generation. Our preliminary empirical study demonstrates a good estimator performance even for a fuzzer with adaptive sampling bias---AFL, a state-of-the-art vulnerability detection tool. The STADS framework provides statistical correctness guarantees with quantifiable accuracy.

READ FULL TEXT
research
11/26/2020

Why Charles Can Pen-test: an Evolutionary Approach to Vulnerability Testing

Discovering vulnerabilities in applications of real-world complexity is ...
research
12/14/2020

Richness estimation with species identity error

Richness estimation of an interesting area is always a challenge statist...
research
12/20/2019

A mechanistic-statistical species distribution model to explain and forecast wolf (Canis lupus) colonization in South-Eastern France

Species distribution models (SDMs) are important statistical tools for e...
research
11/14/2020

Species Abundance Distribution and Species Accumulation Curve: A General Framework and Results

This paper aims at building a general framework that unifies species abu...
research
03/14/2022

A Bayesian Nonparametric Approach to Species Sampling Problems with Ordering

Species-sampling problems (SSPs) refer to a vast class of statistical pr...
research
02/27/2019

A Good-Turing estimator for feature allocation models

Feature allocation models generalize species sampling models by allowing...
research
11/11/2018

Statistical modelling of conidial discharge of entomophthoralean fungi using a newly discovered Pandora species

Entomophthoralean fungi are insect pathogenic fungi and are characterize...

Please sign up or login with your details

Forgot password? Click here to reset