Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?

06/26/2023
by   Hieu Dinh Vo, et al.
0

Recent advances in automated vulnerability detection have achieved potential results in helping developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and localize vulnerabilities for security analysis. In this work, we investigate the problem of vulnerability type identification (VTI). The problem is modeled as the multi-label classification task, which could be effectively addressed by "pre-training, then fine-tuning" framework with deep pre-trained embedding models. We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. Surprisingly, their performance is not much better than that of the classical baseline approach with an old-fashioned bag-of-word, TF-IDF. Meanwhile, these deep neural network approaches cost much more resources and require GPU. We also introduce a lightweight independent component to refine the predictions of the baseline approach. Our idea is that the types of vulnerabilities could strongly correlate to certain code tokens (distinguishing tokens) in several crucial parts of programs. The distinguishing tokens for each vulnerability type are statistically identified based on their prevalence in the type versus the others. Our results show that the baseline approach enhanced by our component can outperform the state-of-the-art deep pre-trained approaches while retaining very high efficiency. Furthermore, the proposed component could also improve the neural network approaches by up to 92.8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?

Software vulnerabilities bear enterprises significant costs. Despite ext...
research
08/24/2023

Pre-trained Model-based Automated Software Vulnerability Repair: How Far are We?

Various approaches are proposed to help under-resourced security researc...
research
02/23/2023

Detecting software vulnerabilities using Language Models

Recently, deep learning techniques have garnered substantial attention f...
research
01/20/2022

VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python

Context: Identifying potential vulnerable code is important to improve t...
research
08/10/2022

Multi-View Pre-Trained Model for Code Vulnerability Identification

Vulnerability identification is crucial for cyber security in the softwa...
research
09/18/2023

Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding

Auto-completing code enables developers to speed up coding significantly...
research
11/23/2022

DeepVulSeeker: A Novel Vulnerability Identification Framework via Code Graph Structure and Pre-training Mechanism

Software vulnerabilities can pose severe harms to a computing system. Th...

Please sign up or login with your details

Forgot password? Click here to reset