VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

05/25/2022
by   Hazim Hanif, et al.
33

This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2021

Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

Understanding the functional (dis)-similarity of source code is signific...
research
11/15/2019

Exploiting Token and Path-based Representations of Code for Identifying Security-Relevant Commits

Public vulnerability databases such as CVE and NVD account for only 60 s...
research
02/05/2023

VuLASTE: Long Sequence Model with Abstract Syntax Tree Embedding for vulnerability Detection

In this paper, we build a model named VuLASTE, which regards vulnerabili...
research
04/12/2023

Evaluation of ChatGPT Model for Vulnerability Detection

In this technical report, we evaluated the performance of the ChatGPT an...
research
06/01/2021

On using distributed representations of source code for the detection of C security vulnerabilities

This paper presents an evaluation of the code representation model Code2...
research
12/13/2021

ROMEO: Exploring Juliet through the Lens of Assembly Language

Automatic vulnerability detection on C/C++ source code has benefitted fr...
research
04/17/2023

Code-centric Learning-based Just-In-Time Vulnerability Detection

Attacks against computer systems exploiting software vulnerabilities can...

Please sign up or login with your details

Forgot password? Click here to reset