Exploring Security Commits in Python

07/21/2023
by   Shiyu Sun, et al.
0

Python has become the most popular programming language as it is friendly to work with for beginners. However, a recent study has found that most security issues in Python have not been indexed by CVE and may only be fixed by 'silent' security commits, which pose a threat to software security and hinder the security fixes to downstream software. It is critical to identify the hidden security commits; however, the existing datasets and methods are insufficient for security commit detection in Python, due to the limited data variety, non-comprehensive code semantics, and uninterpretable learned features. In this paper, we construct the first security commit dataset in Python, namely PySecDB, which consists of three subsets including a base dataset, a pilot dataset, and an augmented dataset. The base dataset contains the security commits associated with CVE records provided by MITRE. To increase the variety of security commits, we build the pilot dataset from GitHub by filtering keywords within the commit messages. Since not all commits provide commit messages, we further construct the augmented dataset by understanding the semantics of code changes. To build the augmented dataset, we propose a new graph representation named CommitCPG and a multi-attributed graph learning model named SCOPY to identify the security commit candidates through both sequential and structural code semantics. The evaluation shows our proposed algorithms can improve the data collection efficiency by up to 40 percentage points. After manual verification by three security experts, PySecDB consists of 1,258 security commits and 2,791 non-security commits. Furthermore, we conduct an extensive case study on PySecDB and discover four common security fix patterns that cover over 85 insight into secure software maintenance, vulnerability detection, and automated program repair.

READ FULL TEXT
research
07/19/2021

CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software

Data-driven research on the automated discovery and repair of security v...
research
10/18/2021

A ground-truth dataset of real security patches

Training machine learning approaches for vulnerability identification an...
research
06/24/2020

Exploring the Security Awareness of the Python and JavaScript Open Source Communities

Software security is undoubtedly a major concern in today's software eng...
research
04/11/2023

A Data Set of Generalizable Python Code Change Patterns

Mining repetitive code changes from version control history is a common ...
research
07/27/2021

A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI

Different security issues are a common problem for open source packages ...
research
09/15/2023

REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes

Software plays a crucial role in our daily lives, and therefore the qual...

Please sign up or login with your details

Forgot password? Click here to reset