An Empirical Study of Deep Learning Models for Vulnerability Detection

12/15/2022
by   Benjamin Steenhoek, et al.
0

Deep learning (DL) models of code have recently reported great progress for vulnerability detection. In some cases, DL-based models have outperformed static analysis tools. Although many great models have been proposed, we do not yet have a good understanding of these models. This limits the further advancement of model robustness, debugging, and deployment for the vulnerability detection. In this paper, we surveyed and reproduced 9 state-of-the-art (SOTA) deep learning models on 2 widely used vulnerability detection datasets: Devign and MSR. We investigated 6 research questions in three areas, namely model capabilities, training data, and model interpretation. We experimentally demonstrated the variability between different runs of a model and the low agreement among different models' outputs. We investigated models trained for specific types of vulnerabilities compared to a model that is trained on all the vulnerabilities at once. We explored the types of programs DL may consider "hard" to handle. We investigated the relations of training data sizes and training data composition with model performance. Finally, we studied model interpretations and analyzed important features that the models used to make predictions. We believe that our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models. All of our datasets, code, and results are available at https://figshare.com/s/284abfba67dba448fdc2.

READ FULL TEXT

page 1

page 5

page 6

page 10

research
09/03/2020

Deep Learning based Vulnerability Detection: Are We There Yet?

Automated detection of software vulnerabilities is a fundamental problem...
research
06/28/2023

Limits of Machine Learning for Automatic Vulnerability Detection

Recent results of machine learning for automatic vulnerability detection...
research
01/06/2021

The data synergy effects of time-series deep learning models in hydrology

When fitting statistical models to variables in geoscientific discipline...
research
05/31/2021

Corpus-Based Paraphrase Detection Experiments and Review

Paraphrase detection is important for a number of applications, includin...
research
11/09/2020

Efficient Training Data Generation for Phase-Based DOA Estimation

Deep learning (DL) based direction of arrival (DOA) estimation is an act...
research
03/22/2021

Shallow or Deep? An Empirical Study on Detecting Vulnerabilities using Deep Learning

Deep learning (DL) techniques are on the rise in the software engineerin...
research
11/29/2022

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

We conduct a systematic study of backdoor vulnerabilities in normally tr...

Please sign up or login with your details

Forgot password? Click here to reset