Deep Learning based Vulnerability Detection: Are We There Yet?

09/03/2020
by   Saikat Chakraborty, et al.
0

Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy of up to 95 "how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?". To our surprise, we find that their performance drops by more than 50 such precipitous performance drop reveals that existing DL-based vulnerability prediction approaches suffer from challenges with the training data (e.g., data duplication, unrealistic distribution of vulnerable classes, etc.) and with the model choices (e.g., simple token-based models). As a result, these approaches often do not learn features related to the actual cause of the vulnerabilities. Instead, they learn unrelated artifacts from the dataset (e.g., specific variable/function names, etc.). Leveraging these empirical findings, we demonstrate how a more principled approach to data collection and model design, based on realistic settings of vulnerability prediction, can lead to better solutions. The resulting tools perform significantly better than the studied baseline: up to 33.57 to the best performing model in the literature. Overall, this paper elucidates existing DL-based vulnerability prediction systems' potential issues and draws a roadmap for future DL-based vulnerability prediction research. In that spirit, we make available all the artifacts supporting our results: https://git.io/Jf6IA.

READ FULL TEXT
research
12/15/2022

An Empirical Study of Deep Learning Models for Vulnerability Detection

Deep learning (DL) models of code have recently reported great progress ...
research
03/22/2021

Shallow or Deep? An Empirical Study on Detecting Vulnerabilities using Deep Learning

Deep learning (DL) techniques are on the rise in the software engineerin...
research
07/22/2022

Learning from what we know: How to perform vulnerability prediction using noisy historical data

Vulnerability prediction refers to the problem of identifying system com...
research
02/15/2021

Expected Exploitability: Predicting the Development of Functional Vulnerability Exploits

Assessing the exploitability of software vulnerabilities at the time of ...
research
06/08/2023

On the Security Blind Spots of Software Composition Analysis

Modern software heavily relies on the use of components. Those component...
research
08/22/2023

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

Though many deep learning (DL)-based vulnerability detection approaches ...
research
12/15/2022

DeepDFA: Dataflow Analysis-Guided Efficient Graph Learning for Vulnerability Detection

Deep learning-based vulnerability detection models have recently been sh...

Please sign up or login with your details

Forgot password? Click here to reset