An Information-Theoretic and Contrastive Learning-based Approach for Identifying Code Statements Causing Software Vulnerability

09/20/2022
by   Van Nguyen, et al.
0

Software vulnerabilities existing in a program or function of computer systems are a serious and crucial concern. Typically, in a program or function consisting of hundreds or thousands of source code statements, there are only few statements causing the corresponding vulnerabilities. Vulnerability labeling is currently done on a function or program level by experts with the assistance of machine learning tools. Extending this approach to the code statement level is much more costly and time-consuming and remains an open problem. In this paper we propose a novel end-to-end deep learning-based approach to identify the vulnerability-relevant code statements of a specific function. Inspired by the specific structures observed in real world vulnerable code, we first leverage mutual information for learning a set of latent variables representing the relevance of the source code statements to the corresponding function's vulnerability. We then propose novel clustered spatial contrastive learning in order to further improve the representation learning and the robust selection process of vulnerability-relevant code statements. Experimental results on real-world datasets of 200k+ C/C++ functions show the superiority of our method over other state-of-the-art baselines. In general, our method obtains a higher performance in VCP, VCA, and Top-10 ACC measures of between 3\% to 14\% over the baselines when running on real-world datasets in an unsupervised setting. Our released source code samples are publicly available at \href{https://github.com/vannguyennd/livuitcl}{https://github.com/vannguyennd/livuitcl.}

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2022

Cross Project Software Vulnerability Detection via Domain Adaptation and Max-Margin Principle

Software vulnerabilities (SVs) have become a common, serious and crucial...
research
05/26/2023

Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Deep learning (DL) models have become increasingly popular in identifyin...
research
09/07/2022

VulCurator: A Vulnerability-Fixing Commit Detector

Open-source software (OSS) vulnerability management process is important...
research
12/20/2021

VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements

Automatically locating vulnerable statements in source code is crucial t...
research
01/07/2022

Predicting sensitive information leakage in IoT applications using flows-aware machine learning approach

This paper presents an approach for identification of vulnerable IoT app...
research
03/16/2022

On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

Many studies have developed Machine Learning (ML) approaches to detect S...
research
12/20/2022

Augmenting Diffs With Runtime Information

Source code diffs are used on a daily basis as part of code review, insp...

Please sign up or login with your details

Forgot password? Click here to reset