VuLASTE: Long Sequence Model with Abstract Syntax Tree Embedding for vulnerability Detection

02/05/2023
by   Botong Zhu, et al.
0

In this paper, we build a model named VuLASTE, which regards vulnerability detection as a special text classification task. To solve the vocabulary explosion problem, VuLASTE uses a byte level BPE algorithm from natural language processing. In VuLASTE, a new AST path embedding is added to represent source code nesting information. We also use a combination of global and dilated window attention from Longformer to extract long sequence semantic from source code. To solve the data imbalance problem, which is a common problem in vulnerability detection datasets, focal loss is used as loss function to make model focus on poorly classified cases during training. To test our model performance on real-world source code, we build a cross-language and multi-repository vulnerability dataset from Github Security Advisory Database. On this dataset, VuLASTE achieved top 50, top 100, top 200, top 500 hits of 29, 51, 86, 228, which are higher than state-of-art researches.

READ FULL TEXT

page 4

page 6

research
04/23/2021

Literature review on vulnerability detection using NLP technology

Vulnerability detection has always been the most important task in the f...
research
05/25/2022

VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

This paper presents VulBERTa, a deep learning approach to detect securit...
research
03/18/2020

Improving the Robustness to Data Inconsistency between Training and Testing for Code Completion by Hierarchical Language Model

In the field of software engineering, applying language models to the to...
research
12/13/2021

ROMEO: Exploring Juliet through the Lens of Assembly Language

Automatic vulnerability detection on C/C++ source code has benefitted fr...
research
05/07/2021

Code2Image: Intelligent Code Analysis by Computer Vision Techniques and Application to Vulnerability Prediction

Intelligent code analysis has received increasing attention in parallel ...
research
08/17/2022

ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection

Neural clone detection has attracted the attention of software engineeri...
research
08/28/2023

Using ChatGPT as a Static Application Security Testing Tool

In recent years, artificial intelligence has had a conspicuous growth in...

Please sign up or login with your details

Forgot password? Click here to reset