PicoDomain: A Compact High-Fidelity Cybersecurity Dataset

08/20/2020
by   Craig Laprade, et al.
0

Analysis of cyber relevant data has become an area of increasing focus. As larger percentages of businesses and governments begin to understand the implications of cyberattacks, the impetus for better cybersecurity solutions has increased. Unfortunately, current cybersecurity datasets either offer no ground truth or do so with anonymized data. The former leads to a quandary when verifying results and the latter can remove valuable information. Additionally, most existing datasets are large enough to make them unwieldy during prototype development. In this paper we have developed the PicoDomain dataset, a compact high-fidelity collection of Zeek logs from a realistic intrusion using relevant Tools, Techniques, and Procedures. While simulated on a small-scale network, this dataset consists of traffic typical of an enterprise network, which can be utilized for rapid validation and iterative development of analytics platforms. We have validated this dataset using traditional statistical analysis and off-the-shelf Machine Learning techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2020

Development of a High Fidelity Simulator for Generalised Photometric Based Space Object Classification using Machine Learning

This paper presents the initial stages in the development of a deep lear...
research
04/19/2022

Multifidelity Deep Operator Networks

Operator learning for complex nonlinear operators is increasingly common...
research
09/19/2021

A two-step machine learning approach for crop disease detection: an application of GAN and UAV technology

Automated plant diagnosis is a technology that promises large increases ...
research
07/25/2022

The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning

In general, large datasets enable deep learning models to perform with g...
research
02/26/2021

Multi-fidelity regression using artificial neural networks: efficient approximation of parameter-dependent output quantities

Highly accurate numerical or physical experiments are often time-consumi...
research
12/01/2018

HUMBI 1.0: HUman Multiview Behavioral Imaging Dataset

This paper presents a new dataset called HUMBI - a large corpus of high ...
research
08/30/2019

High-Fidelity State-of-Charge Estimation of Li-Ion Batteries Using Machine Learning

This paper proposes a way to augment the existing machine learning algor...

Please sign up or login with your details

Forgot password? Click here to reset