REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes

09/15/2023
by   Chaozheng Wang, et al.
0

Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious consequences. Recent advances in automated program repair have sought to automatically detect and fix bugs using data-driven techniques. Sophisticated deep learning methods have been applied to this area and have achieved promising results. However, existing benchmarks for training and evaluating these techniques remain limited, as they tend to focus on a single programming language and have relatively small datasets. Moreover, many benchmarks tend to be outdated and lack diversity, focusing on a specific codebase. Worse still, the quality of bug explanations in existing datasets is low, as they typically use imprecise and uninformative commit messages as explanations. To address these issues, we propose an automated collecting framework REEF to collect REal-world vulnErabilities and Fixes from open-source repositories. We develop a multi-language crawler to collect vulnerabilities and their fixes, and design metrics to filter for high-quality vulnerability-fix pairs. Furthermore, we propose a neural language model-based approach to generate high-quality vulnerability explanations, which is key to producing informative fix messages. Through extensive experiments, we demonstrate that our approach can collect high-quality vulnerability-fix pairs and generate strong explanations. The dataset we collect contains 4,466 CVEs with 30,987 patches (including 236 CWE) across 7 programming languages with detailed related information, which is superior to existing benchmarks in scale, coverage, and quality. Evaluations by human experts further confirm that our framework produces high-quality vulnerability explanations.

READ FULL TEXT

page 1

page 5

research
07/19/2021

CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software

Data-driven research on the automated discovery and repair of security v...
research
04/16/2021

Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

In this paper, we address the problem of automatic repair of software vu...
research
02/07/2022

Enabling Automatic Repair of Source Code Vulnerabilities Using Data-Driven Methods

Users around the world rely on software-intensive systems in their day-t...
research
10/18/2021

A ground-truth dataset of real security patches

Training machine learning approaches for vulnerability identification an...
research
03/21/2020

An Empirical Study on Benchmarks of Artificial Software Vulnerabilities

Recently, various techniques (e.g., fuzzing) have been developed for vul...
research
02/24/2022

Automatically Mitigating Vulnerabilities in x86 Binary Programs via Partially Recompilable Decompilation

When vulnerabilities are discovered after software is deployed, source c...
research
07/21/2023

Exploring Security Commits in Python

Python has become the most popular programming language as it is friendl...

Please sign up or login with your details

Forgot password? Click here to reset