On the Scalability of Big Data Cyber Security Analytics Systems

11/28/2021
by   Faheem Ullah, et al.
0

Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Apache Spark) to collect, store, and analyze a large volume of security event data for detecting cyber-attacks. The volume of digital data in general and security event data in specific is increasing exponentially. The velocity with which the security event data is generated and fed into a BDCA system is unpredictable. Therefore, a BDCA system should be highly scalable to deal with the unpredictable increase/decrease in the velocity of security event data. However, there has been little effort to investigate the scalability of BDCA systems to identify and exploit the sources of scalability improvement. In this paper, we first investigate the scalability of a Spark-based BDCA system with default Spark settings. we then identify Spark configuration parameters (e.g., execution memory) that can significantly impact the scalability of a BDCA system. Based on the identified parameters, we finally propose a parameter-driven adaptation approach, SCALER, for optimizing a system's scalability. We have conducted a set of experiments by implementing a Spark-based BDCA system on a large-scale OpenStack cluster. We ran our experiments with four security datasets. We have found that (i) a BDCA system with default Spark configuration parameters deviates from ideal scalability by 59.5 impact scalability (iii) SCALER improves the BDCA system's scalability by 20.8 compared to the scalability with default Spark parameter setting. The findings of our study highlight the importance of exploring the parameter space of the underlying big data framework (e.g., Apache Spark) for scalable cyber security analytics.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/24/2018

Automated Big Traffic Analytics for Cyber Security

Network traffic analytics technology is a cornerstone for cyber security...
03/03/2020

A Survey on Big Data for Network Traffic Monitoring and Analysis

Network Traffic Monitoring and Analysis (NTMA) represents a key componen...
05/22/2019

Simulation-Based Cyber Data Collection Efficacy

Building upon previous research in honeynets and simulations, we present...
02/09/2018

Architectural Tactics for Big Data Cybersecurity Analytic Systems: A Review

Context: Big Data Cybersecurity Analytics is aimed at protecting network...
12/26/2018

Greening Big Data Networks: The Impact of Veracity

The continuous increase in big data applications, in number and types, c...
11/02/2019

Weibull Racing Time-to-event Modeling and Analysis of Online Borrowers' Loan Payoff and Default

We propose Weibull delegate racing (WDR) to explicitly model surviving u...
06/24/2019

AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data

Recent development in computing, sensing and crowd-sourced data have res...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.