Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue

03/22/2022
by   Rui Shu, et al.
0

Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help security practitioners address software security data class imbalanced issues and further help build better prediction models with resampled datasets. Method: We introduce an approach called Dazzle which is an optimized version of conditional Wasserstein Generative Adversarial Networks with gradient penalty (cWGAN-GP). Dazzle explores the architecture hyperparameters of cWGAN-GP with a novel optimizer called Bayesian Optimization. We use Dazzle to generate minority class samples to resample the original imbalanced training dataset. Results: We evaluate Dazzle with three software security datasets, i.e., Moodle vulnerable files, Ambari bug reports, and JavaScript function code. We show that Dazzle is practical to use and demonstrates promising improvement over existing state-of-the-art oversampling techniques such as SMOTE (e.g., with an average of about 60 Conclusion: Based on this study, we would suggest the use of optimized GANs as an alternative method for security vulnerability data class imbalanced issues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2021

PWG-IDS: An Intrusion Detection Model for Solving Class Imbalance in IIoT Networks Using Generative Adversarial Networks

With the continuous development of industrial IoT (IIoT) technology, net...
research
01/13/2023

Data Quality for Software Vulnerability Datasets

The use of learning-based techniques to achieve automated software vulne...
research
10/23/2022

Imbalanced Class Data Performance Evaluation and Improvement using Novel Generative Adversarial Network-based Approach: SSG and GBO

Class imbalance in a dataset is one of the major challenges that can sig...
research
10/31/2020

Enhanced Balancing GAN: Minority-class Image Generation

Generative adversarial networks (GANs) are one of the most powerful gene...
research
05/07/2020

Minority Class Oversampling for Tabular Data with Deep Generative Models

In practice, data scientists are often confronted with imbalanced data. ...

Please sign up or login with your details

Forgot password? Click here to reset