Statistical Models for the Number of Successful Cyber Intrusions

01/14/2019
by   Nandi O. Leslie, et al.
0

We propose several generalized linear models (GLMs) to predict the number of successful cyber intrusions (or "intrusions") into an organization's computer network, where the rate at which intrusions occur is a function of the following observable characteristics of the organization: (i) domain name server (DNS) traffic classified by their top-level domains (TLDs); (ii) the number of network security policy violations; and (iii) a set of predictors that we collectively call "cyber footprint" that is comprised of the number of hosts on the organization's network, the organization's similarity to educational institution behavior (SEIB), and its number of records on scholar.google.com (ROSG). In addition, we evaluate the number of intrusions to determine whether these events follow a Poisson or negative binomial (NB) probability distribution. We reveal that the NB GLM provides the best fit model for the observed count data, number of intrusions per organization, because the NB model allows the variance of the count data to exceed the mean. We also show that there are restricted and simpler NB regression models that omit selected predictors and improve the goodness-of-fit of the NB GLM for the observed data. With our model simulations, we identify certain TLDs in the DNS traffic as having significant impact on the number of intrusions. In addition, we use the models and regression results to conclude that the number of network security policy violations are consistently predictive of the number of intrusions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2017

Uncertainty in Cyber Security Investments

When undertaking cyber security risk assessments, we must assign numeric...
research
03/26/2018

Forecasting Cyber Attacks with Imbalanced Data Sets and Different Time Granularities

If cyber incidents are predicted a reasonable amount of time before they...
research
03/27/2020

Transition Models for Count Data: a Flexible Alternative to Fixed Distribution Models

A flexible semiparametric class of models is introduced that offers an a...
research
08/03/2023

Telematics Combined Actuarial Neural Networks for Cross-Sectional and Longitudinal Claim Count Data

We present novel cross-sectional and longitudinal claim count models for...
research
02/09/2023

Pricing cyber-insurance for systems via maturity models

Risks associated with information technology systems present a complex m...
research
09/07/2022

Large Scale Enrichment and Statistical Cyber Characterization of Network Traffic

Modern network sensors continuously produce enormous quantities of raw d...
research
07/17/2019

The Statistical Analysis of the Live TV Bit Rate

This paper studies the statistical nature of TV channels streaming varia...

Please sign up or login with your details

Forgot password? Click here to reset