Characterizing and Understanding Software Security Vulnerabilities in Machine Learning Libraries

by   Nima Shiri Harzevili, et al.

The application of machine learning (ML) libraries has been tremendously increased in many domains, including autonomous driving systems, medical, and critical industries. Vulnerabilities of such libraries result in irreparable consequences. However, the characteristics of software security vulnerabilities have not been well studied. In this paper, to bridge this gap, we take the first step towards characterizing and understanding the security vulnerabilities of five well-known ML libraries, including Tensorflow, PyTorch, Sickit-learn, Pandas, and Numpy. To do so, in total, we collected 596 security-related commits to exploring five major factors: 1) vulnerability types, 2) root causes, 3) symptoms, 4) fixing patterns, and 5) fixing efforts of security vulnerabilities in ML libraries. The findings of this study can assist developers in having a better understanding of software security vulnerabilities across different ML libraries and gain a better insight into their weaknesses of them. To make our finding actionable, we further developed DeepMut, an automated mutation testing tool, as a proof-of-concept application of our findings. DeepMut is designed to assess the adequacy of existing test suites of ML libraries against security-aware mutation operators extracted from the vulnerabilities studied in this work. We applied DeepMut on the Tensorflow kernel module and found more than 1k alive mutants not considered by the existing test suits. The results demonstrate the usefulness of our findings.


page 7

page 9


You Really Shouldn't Roll Your Own Crypto: An Empirical Study of Vulnerabilities in Cryptographic Libraries

The security of the Internet rests on a small number of open-source cryp...

On the Use of Refactoring in Security Vulnerability Fixes: An Exploratory Study on Maven Libraries

Third-party library dependencies are commonplace in today's software dev...

What Do Developers Ask About ML Libraries? A Large-scale Study Using Stack Overflow

Modern software systems are increasingly including machine learning (ML)...

Machine Learning Containers are Bloated and Vulnerable

Today's software is bloated leading to significant resource wastage. Thi...

ConFL: Constraint-guided Fuzzing for Machine Learning Framework

As machine learning gains prominence in various sectors of society for a...

VULNERLIZER: Cross-analysis Between Vulnerabilities and Software Libraries

The identification of vulnerabilities is a continuous challenge in softw...

Mutation Testing framework for Machine Learning

This is an article or technical note which is intended to provides an in...

Please sign up or login with your details

Forgot password? Click here to reset