1 Introduction
Cyber-security professionals play a vital role in assessing and predicting vulnerabilities within cyber-physical systems, which often form part of an organisation’s or state’s critical digital infrastructure. As outsider threats become more prevalent and sophisticated, there is increasing pressure on experts to provide timely and comprehensive assessments within the context of the rapidly changing cyber-physical ecosystem. As cyber-systems increase in both ubiquity and complexity, methods to quantify and handle error in subjective measurements from experts need to be developed [13]. It has been demonstrated across many industry sectors that as complexity increases accurate risk assessment decreases [10]. Enabling the effective reconstruction of overall assessments from component and attribute ratings would streamline the process of updating overall system vulnerability assessments, in line with shifts in this ecosystem.
Both objective and subjective measures of risk provide useful information to aid decision making in vulnerability assessment [4, 5]. Several different methods can be used to assess vulnerability and risk in a cyber-security system, such as vulnerability scanning tools [18] or the Common Vulnerability Scoring System (CVSS) [15], which gives qualitative severity ratings of low, medium, and high, and CVSS Version 3, which extends the ratings to include none and critical [8]. However, when using CVSS, the necessary information to complete the calculation may be missing. Hubbard and Seiersen [11] argue that assessing risk in terms of low, medium, and high ratings is highly subjective and open to error. It is therefore suggested that cyber-security risk should be described quantitatively. This would help quantify what areas of risk are perceived as important to cyber-security professionals and, from that, move towards how those risks might correspond to the actually enacted attacks and their success or failure.
Assessment of risk or the likelihood of an attack are inherently uncertain [1, 2]. Objective measures may carry uncertainty because the measures themselves are imprecisely defined [2]. Subjective assessments (collected from experts) carry uncertainty because, for example, the experts are not familiar with the particular technology, there is inherent uncertainty caused by insufficient detail in the scenario, or due to individual personality [12, 16, 20].
Between-expert uncertainty is often modelled implicitly, for example, through probability distributions
[7] or uncertainty measures [5]. These methods model the between-expert uncertainty, but they do not capture within-expert uncertainty. Choi et al. [4] capture within-expert uncertainty by enabling experts to express knowledge and uncertainty through terms such as very small, small and large and using fuzzy sets to represent the uncertainty of these words. However, this assumes that the experts share the same degree of uncertainty regarding the meanings of these terms. After capturing uncertainty, it may be modelled and handled through methods such as Dempster-Shafer evidence theory [6, 9] or fuzzy logic [4, 14, 19].We propose explicitly capturing uncertainty in experts’ individual judgements using an interval-valued response format, as previously introduced in [17]. Experts provide ratings along a continuous scale to quantify, for example, the perceived difficulty of an attack on a component. Using an interval captures both the experts’ rating (position on the axis) and the degree of uncertainty associated with the response (width of the interval). Fig. 1 shows an example of a narrow rating (slightly uncertain) and a wide rating (highly uncertain). In this manner, uncertainty is captured as an integral aspect of the judgement itself, through a coherent and intuitive response-format. The novel output of this paper is to show that an interval-valued response scale can be used to effectively capture uncertainty in expert judgements, and that this can be used to better predict both the magnitude and the uncertainty of risk in vulnerability assessments.
![]() |
![]() |
In this paper, we assess the importance of a variety of attributes in determining the overall vulnerability of components being attacked or evaded within a cyber-system; these include maturity of the technology and the frequency that a given attack is reported (the full set of attributes are listed in Tables 1 and 2). We also capture the overall difficulty of attacking or evading each component. Our core aim is to understand how the component attributes contribute to the overall difficulty of an attack and, equally importantly, how uncertainty in component attributes affects not only the uncertainty in the overall vulnerability of a component, but also the overall difficulty itself. For example, experts might perceive an attack as more difficult if there are fewer tools available to aid in this effort. However, experts might also perceive an attack as more difficult if they are uncertain about the availability of tools. From this, we can learn what additional insight is gained by capturing uncertainty through interval-valued responses. Specifically, we wish to answer how is overall vulnerability of a component affected by attribute ratings? by uncertainty around attribute ratings? how is uncertainty around overall vulnerability of a component affected by attribute ratings? by uncertainty around attribute ratings? We find that although cyber-security experts only assessed attributes that were previously deemed likely to be important to component vulnerability, only some of these attributes have a significant effect. Similarly, we discover that although uncertainty around some attributes affects the uncertainty around the component as a whole, this is not consistent for all attributes. We also find that the uncertainty around some attribute ratings makes a significant contribution to overall vulnerability ratings themselves.
2 Methods
2.1 Data Collection
The data was collected from experts in CESG (Communications-Electronics Security Group), which was the information security arm of GCHQ (Government Communication Head Quarters) in the United Kingdom111CESG has since been replaced by the NCSC (National Cyber Security Centre).. A total of 38 cyber-security experts at CESG assessed a range of components that are commonly encountered during a cyber-attack. They rated these on both overall difficulty to either attack or evade, as appropriate, and on several attributes that might affect this difficulty. Two types of components (also referred to as hops) were assessed, those that require an attacker to attack the component (referred to as attack) and those that require the attacker to bypass the component (referred to as evade). Examples of the hops assessed include bypass gateway content checker and overcome client lockdown, but any hop may be assessed using this method.
Tables 1 and 2 list the attack and evade attributes, respectively (variable notations are also provided, which are used in the next section). Attack hops are defined by seven attributes and evade hops by three attributes. In addition, both are described by their overall difficulty. Our aim is to understand how the hop attributes and the uncertainty around these attributes relate to the perceived overall difficulty of attacking/evading the hop, and to the uncertainty around this difficulty. Experts were asked to provide interval-valued ratings to enable them to coherently express the uncertainty associated with their responses; these ratings were provided on a scale from 0 to 100.
The cyber-security components chosen for this study are designed to be representative of a mainstream government system, which would include assets at Business Impact Level 3 (BIL3) – an intermediate category of impact [3]. CESG describes such a system as typically including remote site and mobile working, backed by core services and back-end office integrated systems, such as telemetry devices and associated systems used by the emergency services. Compromising assets at BIL3 might have large-scale negative effects, including but not limited to: disruption to regional power supply, key local transport systems, emergency or other important local services for up to 24 hours, local loss of telecoms, risk to an individual’s safety, damage to intelligence operations, hindrance to low level crime detection and prosecution, or financial loss to the UK government or a leading financial company in the order of millions (GBP) [3].
var | attribute | description |
---|---|---|
complexity | How complex is the target component (e.g. in terms of size of code, number of sub-components)? | |
interaction | How much does the target component process/interact with any data input? | |
frequency | How often would you say this type of attack is reported in the public domain? | |
availability of tool | How likely is it that there will be a publicly available tool that could help with this attack? | |
inherent difficulty | How inherently difficult is this type of attack? (i.e. how technically demanding would it be to do from scratch, with no tools to help.) | |
maturity | How mature is this type of technology? | |
going unnoticed | How easy is it to carry this attack out without being noticed? | |
overall difficulty | Overall, how difficult would it be for an attacker to do this? |
var | attribute | description |
---|---|---|
complexity | How complex is the job of providing this kind of defence? | |
availability of information | How likely is that there will be publicly available information that could help with evading defence? | |
maturity | How mature is this type of technology? | |
overall difficulty | Overall, how difficult would it be for an attacker to do this? |
2.2 Analysis
We use linear mixed effects modelling (an extension of linear regression) to determine the contribution of each of the hop attributes, as rated by experts, to overall hop difficulty. We also assess the contribution of the associated uncertainty in these ratings, captured through the interval-valued response-format. The midpoint (
) of the interval-valued response is used as a single-valued numeric rating of the attribute, and the width () of the response is used to represent the uncertainty around this rating. Note, of course, that higher widths are only possible towards the centre of the scale. That is, as the midpoint approaches the edge of the scale, a wide interval cannot exist. Also, note that while experts provided ratings in the range, these data were standardised, through z-transformation, before entry into the model.
This approach estimates the contribution of each attribute’s midpoint and width together upon the same outcome variable, in the form of
weights. These variables are entered as fixed effects. The inclusion of random intercepts also allows the model to account for potential between expert and between hop differences in baseline ratings. In addition, this technique allows us to examine the combined effects of attribute rating and uncertainty (e.g. high certainty may have an opposite effect on overall difficulty when relating to a high or low attribute rating). We model this through the inclusion of two-way interaction terms () pertaining to the midpoint and width of each attribute.Four separate analyses are reported. These were conducted for the dependent variables of
-
attack hop overall difficulty rating (interval midpoint)
-
attack hop overall uncertainty (interval width)
-
evade hop overall difficulty rating (interval midpoint)
-
evade hop overall uncertainty (interval width)
For each of these analyses, an initial model was created. These included fixed effects of all hop attribute ratings, all hop attribute widths and all two-way interactions, along with random intercepts for both expert and hop. Following this, a stepwise variable reduction process was applied to each model, in order to remove variables that were found not to significantly contribute to the respective outcome variable. weights of the variables retained into the final models were then interpreted as estimates of the (significant) contribution of each of these factors to the respective outcome variable.
Table 1 lists the variables used to denote the attributes of attack hops. The sum of all simple effects for the attack hop attribute midpoints (ratings) is
(1) |
where is the coefficient, is the value (midpoint) of attribute (complexity) for (a given expert) and (a given hop), and reflects the model’s outcome variable, which may be either (midpoints) or (widths) of the overall difficulty.
The sum of all simple effects for the attack hop attribute widths (uncertainty) is
(2) |
where is the width of attribute (complexity) for (a given expert) and (a given hop).
The sum of the interactions between the midpoints and widths of the attack hop attributes is
(3) |
Our initial model formula to explain the overall difficulty rating midpoints () and widths () of attack hops is then
(4) |
where reflects the model’s outcome variable, which may be either (midpoints) or (widths), for expert on hop ; denotes the fixed intercept; and denote respective random intercepts for expert and hop; and represents the error. The remaining terms (within , and ) denote the coefficients of the fixed effects of the hop attributes.
We perform likewise calculations for the evade hops (variables listed in Table 2). The sum of effects for the midpoints of the evade hops is
(5) |
for the widths is
(6) |
and for the interactions is
(7) |
Our initial model formula to explain the overall difficulty ratings for midpoints () and widths () of evade hops is then
(8) |
Each of the initial models, as presented above, was then subjected to a backwards stepwise variable elimination procedure. During this, fixed effects were iteratively assessed and those that did not significantly contribute to the overall model were removed. Specifically, this process began by selection, from the pool of all non-significant fixed effects, of the effect with the t-statistic closest to zero. This variable was then removed, and the resulting model directly compared with the preceding one, using the Theoretical Likelihood Ratio test. This was implemented through the MATLAB and functions. If the benefit of retaining the variable in question was calculated to be non-significant, then the model with the lower Bayesian Information Criterion (BIC) was retained into the next iteration. This procedure continued until a final model was determined, within which all fixed effects were statistically significant.
3 Results
3.1 Attack Hops
Table 3 shows all effects retained in the final model with the outcome variable of overall attack hop difficulty (midpoints), following the stepwise variable reduction process. These results indicate that a number of factors make a substantial contribution. Attacks were rated less difficult if they are frequently reported or have a large availability of tools. By contrast, attacks were rated as more difficult if they have a greater inherent difficulty or relate to more mature technologies. Attacks were also rated as more difficult when technological maturity was uncertain and, perhaps surprisingly, when easier to go unnoticed. The latter might relate to some underlying factor - for instance, some attacks may be difficult to conduct, but also difficult to detect. The significant interaction term () indicates a combined effect of reported tool availability and uncertainty around this. This likely reflects that a hop is rated as being more difficult to attack when experts are certain about availability being low, but less difficult when experts are certain about availability being high. Unsurprisingly, the inherent difficulty rating was found to have the most robust effect.
Fixed Effects Estimates | SE | |||
Intercept : | .012 | .066 | .175 | .861 |
Frequency : | -.223 | .044 | -5.065 | |
Availability Tool : | -.201 | .044 | -4.574 | |
Inherent Difficulty : | .357 | .030 | 11.890 | |
Maturity : | .126 | .030 | 4.159 | |
Going Unnoticed. : | .142 | .027 | 5.194 | |
Maturity : | .071 | .027 | 2.612 | .009 |
Availability Tool : | .077 | .036 | 2.168 | .031 |
Random Effects Estimates | ||||
Expert intercept | .183 | |||
Hop intercept | .204 | |||
Residual | .502 | |||
N = 532, DF = 524, AIC = 896.7, BIC=943.6 |
Table 4 shows all effects retained in the final model with the outcome variable of uncertainty surrounding overall attack hop difficulty (widths). Even more factors were retained in this final model. Results indicated that experts were more certain about the vulnerability of hops on which attacks were reported more frequently -– likely due to familiarity. They were also more certain regarding hops that relate to mature technologies or when tool availability is low. By contrast, overall uncertainty significantly increased in line with attribute uncertainty for reported attack frequency, tool availability, inherent difficulty, and ease of going unnoticed. Two interaction terms were also retained, indicating that the effects of uncertainty around these attributes on overall uncertainty were significantly modulated by attribute rating, or vice versa. These can be interpreted together with main effects, depending upon direction. For example, in the case of ease of going unnoticed, there is a relatively large positive main effect of attribute uncertainty and a smaller, but significant, negative interaction term. This indicates that overall ratings were most uncertain when going unnoticed was considered difficult but uncertain. However, overall ratings were most certain when going unnoticed was considered difficult with certainty. The effect of attribute uncertainty was reduced, though still substantial, around hops rated as easier to go unnoticed when attacking. For maturity, however, there is a negative main effect of attribute rating and a negative interaction term, of comparable size. This indicates that overall uncertainty was greatest when maturity was rated low but uncertain. Also, while an increase in maturity rating tended to increase the certainty of overall ratings, this effect was driven by cases in which maturity was itself uncertain. Of all effects in this analysis, uncertainty surrounding the inherent difficulty rating was found to be the most robust.
Fixed Effects Estimates | SE | |||
---|---|---|---|---|
Intercept : | -.031 | .036 | -.857 | .392 |
Frequency : | -.116 | .045 | -2.614 | .009 |
Availability Tool : ( | .131 | .045 | 2.934 | .003 |
Maturity : | -.093 | .031 | -3.013 | .003 |
Frequency : | .141 | .035 | 4.034 | |
Availability Tool : | .095 | .039 | 2.420 | .016 |
Inherent Difficulty : | .406 | .037 | 10.959 | |
Going Unnoticed : | .268 | .036 | 7.399 | |
Maturity : | -.122 | .035 | -3.484 | |
Going Unnoticed : | -.080 | .035 | -2.270 | .024 |
Random Effects Estimates | ||||
Expert intercept | .127 | |||
Hop intercept | .000 | |||
Residual | .609 | |||
N = 532, DF = 522, AIC = 1066.3, BIC=1121.7 |
3.2 Evade Hops
Table 5 shows all effects retained in the final model with the outcome variable of overall evade hop difficulty (midpoints). Four fixed effects were retained. Experts rated a hop as less difficult to evade when more information is available to aid with this, but more difficult to evade when they were uncertain about the availability of such information. Overall evasion difficulty was also higher for hops relating to more mature technologies. A negative interaction term was evident for ratings of hop complexity, with certainty around a more complex hop being associated with it being more difficult to evade, but certainty around a hop being less complex associated with it being easier to evade. The availability of information was found to be have the most robust effect.
Fixed Effects Estimates | SE | |||
---|---|---|---|---|
Intercept : | -.023 | .133 | -.173 | .863 |
Availability Information : | -.240 | .049 | -4.895 | |
Maturity : | .177 | .051 | 3.459 | |
Availability Information : | .142 | .049 | 2.878 | .004 |
Complexity : | -.105 | .053 | -1.993 | .047 |
Random Effects Estimates | ||||
Expert intercept | .457 | |||
Hop intercept | .340 | |||
Residual | .772 | |||
N = 418, DF = 413, AIC = 1081.8, BIC=1114.0 |
Fixed Effects Estimates | SE | |||
---|---|---|---|---|
Intercept : | -.000 | .058 | -.000 | |
Complexity : | .241 | .046 | 5.200 | |
Availability Information : | .440 | .045 | 9.683 | |
Maturity : | .134 | .045 | 2.982 | .003 |
Random Effects Estimates | ||||
Expert intercept | .070 | |||
Hop intercept | .159 | |||
Residual | .643 | |||
N = 418, DF = 414, AIC = 863.0, BIC=891.2 |
Table 6 shows all effects retained in the final model with the outcome variable of uncertainty surrounding overall evade hop difficulty (widths). These results show that experts were more uncertain in their overall hop rating when they were more uncertain about each of the three attributes of a given hop: complexity, information availability, or maturity. However, no significant main effects of attribute rating position, nor any interaction terms were found. Uncertainty surrounding information availability was found to have the most robust effect.
4 Conclusions
We analyse ratings provided by cyber-security experts that pertain to a range of potentially important component (hop) attributes, previously identified as commonly occurring within attack vectors of mainstream government cyber-systems. Importantly, these ratings were obtained through interval-valued responses, which enable experts to indicate both their rating and the uncertainty associated with this rating in a single, integrated response.
Our analyses provide a ‘proof of concept’ for interval-valued data capture applied to the field of cyber-security. We identify key factors that contribute to both component vulnerability and uncertainty, depending on whether a hop requires compromising or only bypassing. For example, the availability of information has the largest impact on the overall difficulty of evading a component, while uncertainty around the inherent difficulty of an attack has the largest impact on its overall uncertainty.
Uncertainty in experts’ attribute ratings is found to be valuable in determining not only overall uncertainty, but also overall hop vulnerability. In a number of specific cases, this information explained variance over and above the discrete midpoints of attribute ratings. For instance, when predicting the overall difficulty of attack hops, uncertainty around the maturity of technology of a given hop was associated with a significant increase in difficulty rating for that hop. In other cases, we found that in order to best explain overall difficulty ratings it was necessary to consider an interaction effect, between the position and width of responses. Sometimes, the combination of both factors provided a better predictor than either did alone. It was the novel use of an interval-valued response format that made it possible to coherently capture this uncertainty, alongside traditional ratings.
This study provides initial empirical evidence for the potential added-value offered by capturing interval-valued responses to model expert uncertainty. This is demonstrated in the case of modelling vulnerabilities within cyber-systems comprising multiple components, each with varying attributes, as is characteristic of cyber-physical systems. The benefit of using interval-valued responses was found using a comparatively low-complexity linear modelling approach. While such an approach is unsuited to capturing varying effects of responses along the response scale (for instance, it cannot account for a tendency for responses to saturate towards the extremities), its simplicity facilitates interpretation of the results. In future work, we will investigate the use of generalised additive models, within which the relationships between independent and dependent variables may be non-linear. Additionally, we are pursuing an ongoing programme of research to demonstrate the efficacy of interval-valued responses, both in terms of capturing uncertainty and improving predictive power, with reference to real-world ground-truth.
References
- [1] Aven, T., Renn, O.: On risk defined as an event where the outcome is uncertain. Journal of risk research 12(1), 1–11 (2009)
- [2] Black, P.E., Scarfone, K., Souppaya, M.: Cyber security metrics and measures. Wiley Handbook of Science and Technology for Homeland Security pp. 1–15 (2008)
- [3] CESG: Extract from HMG IA Standard No.1 Business Impact Level Tables. CESG (2009)
- [4] Choi, H.H., Cho, H.N., Seo, J.W.: Risk assessment methodology for underground construction projects. Journal of Construction Engineering and Management 130(2), 258–272 (2004)
- [5] Duan, Y., Cai, Y., Wang, Z., Deng, X.: A novel network security risk assessment approach by combining subjective and objective weights under uncertainty. Applied Sciences 8(3) (2018). https://doi.org/10.3390/app8030428, http://www.mdpi.com/2076-3417/8/3/428
- [6] Feng, N., Li, M.: An information systems security risk assessment model under uncertain environment. Applied Soft Computing 11(7), 4332–4340 (2011)
- [7] Fielder, A., Konig, S., Panaousis, E., Schauer, S., Rass, S.: Uncertainty in cyber security investments. arXiv preprint arXiv:1712.05893 (2017)
- [8] FIRST: Cvss v3.0 specification document, https://www.first.org/cvss/specification-document
- [9] Gao, H., Zhu, J., Li, C.: The analysis of uncertainty of network security risk assessment using dempster-shafer theory. In: 2008 12th International Conference on Computer Supported Cooperative Work in Design. pp. 754–759. IEEE (2008)
- [10] Gardner, D.: Risk: The science and politics of fear. Random House (2009)
- [11] Hubbard, D.W., Seiersen, R.: How to measure anything in cybersecurity risk. John Wiley & Sons (2016)
-
[12]
Kahneman, D., Slovic, S.P., Slovic, P., Tversky, A.: Judgment under uncertainty: Heuristics and biases. Cambridge university press (1982)
- [13] Koubatis, A., Schonberger, J.Y.: Risk management of complex critical systems. International journal of critical infrastructures 1(2-3), 195–215 (2005)
-
[14]
Linda, O., Manic, M., Vollmer, T., Wright, J.: Fuzzy logic based anomaly detection for embedded network security cyber sensor. In: 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS). pp. 202–209. IEEE (2011)
- [15] Mell, P., Scarfone, K., Romanosky, S.: A complete guide to the common vulnerability scoring system version 2.0. In: Published by FIRST-Forum of Incident Response and Security Teams. vol. 1, p. 23 (2007)
- [16] Miller, S., Appleby, S., Garibaldi, J.M., Aickelin, U.: Towards a more systematic approach to secure systems design and analysis. International Journal of Secure Software Engineering (IJSSE) 4(1), 11–30 (2013)
- [17] Miller, S., Wagner, C., Aickelin, U., Garibaldi, J.M.: Modelling cyber-security experts’ decision making processes using aggregation operators. computers & security 62, 229–245 (2016)
- [18] Munir, R., Disso, J.P., Awan, I., Mufti, M.R.: A quantitative measure of the security risk level of enterprise networks. In: 2013 Eighth International Conference on Broadband and Wireless Computing, Communication and Applications. pp. 437–442. IEEE (2013)
- [19] Sikos, L.F.: Handling uncertainty and vagueness in network knowledge representation for cyberthreat intelligence. In: 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). pp. 1–6. IEEE (2018)
- [20] Slovic, P.: The perception of risk. Routledge (2016)
Comments
There are no comments yet.