An AST-based Code Change Representation and its Performance in Just-in-time Vulnerability Prediction

03/29/2023
by   Tamás Aladics, et al.
0

The presence of software vulnerabilities is an ever-growing issue in software development. In most cases, it is desirable to detect vulnerabilities as early as possible, preferably in a just-in-time manner, when the vulnerable piece is added to the code base. The industry has a hard time combating this problem as manual inspection is costly and traditional means, such as rule-based bug detection, are not robust enough to follow the pace of the emergence of new vulnerabilities. The actively researched field of machine learning could help in such situations as models can be trained to detect vulnerable patterns. However, machine learning models work well only if the data is appropriately represented. In our work, we propose a novel way of representing changes in source code (i.e. code commits), the Code Change Tree, a form that is designed to keep only the differences between two abstract syntax trees of Java source code. We compared its effectiveness in predicting if a code change introduces a vulnerability against multiple representation types and evaluated them by a number of machine learning models as a baseline. The evaluation is done on a novel dataset that we published as part of our contributions using a 2-phase dataset generator method. Based on our evaluation we concluded that using Code Change Tree is a valid and effective choice to represent source code changes as it improves performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2021

Bug Prediction Using Source Code Embedding Based on Doc2Vec

Bug prediction is a resource demanding task that is hard to automate usi...
research
05/16/2021

Improving Vulnerability Prediction of JavaScript Functions Using Process Metrics

Due to the growing number of cyber attacks against computer systems, we ...
research
06/21/2021

An empirical evaluation of the usefulness of Tree Kernels for Commit-time Defect Detection in large software systems

Defect detection at commit check-in time prevents the introduction of de...
research
06/22/2022

Exploring the Impact of Code Style in Identifying Good Programmers

Code style reflects the choice of textual representation of source code....
research
04/17/2023

Code-centric Learning-based Just-In-Time Vulnerability Detection

Attacks against computer systems exploiting software vulnerabilities can...
research
06/01/2021

On using distributed representations of source code for the detection of C security vulnerabilities

This paper presents an evaluation of the code representation model Code2...
research
07/05/2023

Vulnerable Source Code Detection using SonarCloud Code Analysis

In Software Development Life Cycle (SDLC), security vulnerabilities are ...

Please sign up or login with your details

Forgot password? Click here to reset