Learning from what we know: How to perform vulnerability prediction using noisy historical data

07/22/2022
by   Aayush Garg, et al.
0

Vulnerability prediction refers to the problem of identifying system components that are most likely to be vulnerable. Typically, this problem is tackled by training binary classifiers on historical data. Unfortunately, recent research has shown that such approaches underperform due to the following two reasons: a) the imbalanced nature of the problem, and b) the inherently noisy historical data, i.e., most vulnerabilities are discovered much later than they are introduced. This misleads classifiers as they learn to recognize actual vulnerable components as non-vulnerable. To tackle these issues, we propose TROVON, a technique that learns from known vulnerable components rather than from vulnerable and non-vulnerable components, as typically performed. We perform this by contrasting the known vulnerable, and their respective fixed components. This way, TROVON manages to learn from the things we know, i.e., vulnerabilities, hence reducing the effects of noisy and unbalanced data. We evaluate TROVON by comparing it with existing techniques on three security-critical open source systems, i.e., Linux Kernel, OpenSSL, and Wireshark, with historical vulnerabilities that have been reported in the National Vulnerability Database (NVD). Our evaluation demonstrates that the prediction capability of TROVON significantly outperforms existing vulnerability prediction techniques such as Software Metrics, Imports, Function Calls, Text Mining, Devign, LSTM, and LSTM-RF with an improvement of 40.84 Matthews Correlation Coefficient (MCC) score under Clean Training Data Settings, and an improvement of 35.52

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2020

Learning To Predict Vulnerabilities From Vulnerability-Fixes: A Machine Translation Approach

Vulnerability prediction refers to the problem of identifying the system...
research
09/03/2020

Deep Learning based Vulnerability Detection: Are We There Yet?

Automated detection of software vulnerabilities is a fundamental problem...
research
10/29/2020

Examining the Relationship of Code and Architectural Smells with Software Vulnerabilities

Context: Security is vital to software developed for commercial or perso...
research
09/18/2020

On the Threat of npm Vulnerable Dependencies in Node.js Applications

Software vulnerabilities have a large negative impact on the software sy...
research
06/04/2020

Vulnerability Analysis of 2500 Docker Hub Images

The use of container technology has skyrocketed during the last few year...
research
01/04/2019

V-Fuzz: Vulnerability-Oriented Evolutionary Fuzzing

Fuzzing is a technique of finding bugs by executing a software recurrent...
research
04/03/2023

OutCenTR: A novel semi-supervised framework for predicting exploits of vulnerabilities in high-dimensional datasets

An ever-growing number of vulnerabilities are reported every day. Yet th...

Please sign up or login with your details

Forgot password? Click here to reset