When and Why Test Generators for Deep Learning Produce Invalid Inputs: an Empirical Study

12/21/2022
by   Vincenzo Riccio, et al.
0

Testing Deep Learning (DL) based systems inherently requires large and representative test sets to evaluate whether DL systems generalise beyond their training datasets. Diverse Test Input Generators (TIGs) have been proposed to produce artificial inputs that expose issues of the DL systems by triggering misbehaviours. Unfortunately, such generated inputs may be invalid, i.e., not recognisable as part of the input domain, thus providing an unreliable quality assessment. Automated validators can ease the burden of manually checking the validity of inputs for human testers, although input validity is a concept difficult to formalise and, thus, automate. In this paper, we investigate to what extent TIGs can generate valid inputs, according to both automated and human validators. We conduct a large empirical study, involving 2 different automated validators, 220 human assessors, 5 different TIGs and 3 classification tasks. Our results show that 84 artificially generated inputs are valid, according to automated validators, but their expected label is not always preserved. Automated validators reach a good consensus with humans (78 with feature-rich datasets.

READ FULL TEXT

page 3

page 7

page 8

research
07/06/2020

Model-based Exploration of the Frontier of Behaviours for Deep Learning System Testing

With the increasing adoption of Deep Learning (DL) for critical tasks, s...
research
01/09/2019

A Deep Learning based Approach to Automated Android App Testing

Automated input generators are widely used for large-scale dynamic analy...
research
07/05/2021

DeepHyperion: Exploring the Feature Space of Deep Learning-Based Systems through Illumination Search

Deep Learning (DL) has been successfully applied to a wide range of appl...
research
04/04/2023

Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT

Deep Learning (DL) library bugs affect downstream DL applications, empha...
research
11/30/2018

Zest: Validity Fuzzing and Parametric Generators for Effective Random Testing

Programs expecting structured inputs often consist of both a syntactic a...
research
08/29/2017

Active Learning of Input Grammars

Knowing the precise format of a program's input is a necessary prerequis...
research
06/15/2020

Deep-CAPTCHA: a deep learning based CAPTCHA solver for vulnerability assessment

CAPTCHA is a human-centred test to distinguish a human operator from bot...

Please sign up or login with your details

Forgot password? Click here to reset