On using distributed representations of source code for the detection of C security vulnerabilities

06/01/2021
by   David Coimbra, et al.
21

This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code. We leverage the open-source library astminer to extract path-contexts from the abstract syntax trees of a corpus of labeled C functions. Code2vec is trained on the resulting path-contexts with the task of classifying a function as vulnerable or non-vulnerable. Using the CodeXGLUE benchmark, we show that the accuracy of Code2vec for this task is comparable to simple transformer-based methods such as pre-trained RoBERTa, and outperforms more naive NLP-based methods. We achieved an accuracy of 61.43 low computational requirements relative to larger models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2021

Tracing Vulnerable Code Lineage

This paper presents results from the MSR 2021 Hackathon. Our team invest...
research
02/14/2022

What Do They Capture? – A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been pro...
research
05/25/2022

VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

This paper presents VulBERTa, a deep learning approach to detect securit...
research
03/29/2023

An AST-based Code Change Representation and its Performance in Just-in-time Vulnerability Prediction

The presence of software vulnerabilities is an ever-growing issue in sof...
research
02/05/2022

GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network

With the continuous extension of the Industrial Internet, cyber incident...
research
11/18/2019

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in field...
research
07/18/2022

What does Transformer learn about source code?

In the field of source code processing, the transformer-based representa...

Please sign up or login with your details

Forgot password? Click here to reset