Identifying Relevant Information Cues for Vulnerability Assessment Using CVSS

03/20/2018 ∙ by Luca Allodi, et al. ∙ TU Eindhoven 0

The assessment of new vulnerabilities is an activity that accounts for information from several data sources and produces a `severity' score for the vulnerability. The Common Vulnerability Scoring System () is the reference standard for this assessment. Yet, no guidance currently exists on which information aids a correct assessment and should therefore be considered. In this paper we address this problem by evaluating which information cues increase (or decrease) assessment accuracy. We devise a block design experiment with 67 software engineering students with varying vulnerability information and measure scoring accuracy under different information sets. We find that baseline vulnerability descriptions provided by standard vulnerability sources provide only part of the information needed to achieve an accurate vulnerability assessment. Further, we find that additional information on assets, attacks, and vulnerability type contributes in increasing the accuracy of the assessment; conversely, information on known threats misleads the assessor and decreases assessment accuracy and should be avoided when assessing vulnerabilities. These results go in the direction of formalizing the vulnerability communication to, for example, fully automate security assessments.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Addressing software vulnerabilities is an important process in any software development project (Howard and Lipner, 2006) to maintain software quality and mitigate risk of attack for the users. Several standards, such as PCI-DSS for the management of credit card information and NIST’s SCAP protocol (adopted for example by the U.S. DoD directive 8500.01), require the use of the Common Vulnerability Scoring System (, 2015b), CVSS, as the metric of choice for vulnerability measurement and prioritisation (PCI, 2010; Quinn et al., 2010). The CVSS specification (, 2015b) describes a framework that the assessor follows to transform information about the vulnerability into a CVSS score, and provides a number of ‘dimensions’ or ‘metrics’ over which the assessor performs his or her evaluation. For example, the assessor may evaluate that the vulnerability can be remotely accessed, and assign a Network value to the CVSS metric

Attack Vector

; similarly, he or she may conclude that a successful attack requires the victim user to perform specific actions for the attack to be successful, and assign a Required value to the CVSS metric User Interaction.

The result of these assessments depends strongly on what information on the vulnerability is available to the assessor. Notably, this information may vary substantially, ranging from general descriptions such as “Unspecified vulnerability in [..] allows local users to affect availability via vectors related to Kernel”,111 to more technically detailed information (Christey and Martin, 2013). Whereas the type of information one can gather generally covers type of vulnerability, attack procedure, and existence of threats (Holm and Afridi, 2015), no guidance currently exists on the mapping of which information should be considered when performing an assessment over the CVSS metrics. For example, analysing the attack procedure may provide details on the position of the attacker w.r.t. the vulnerable software component (captured by the Attack Vector CVSS metric), but may not reveal useful information to evaluate which privileges are required to exploit the vulnerability (captured by the Privileges Required metric). This prevents the development and use of automatic tools that can provide useful summaries of available information that the assessor can use when performing his or her CVSS evaluation of the vulnerability.

In this study we evaluate which information cues can aid the vulnerability assessment process as guided by the CVSS standard, and should therefore be readily provided to the assessor. This paper’s contributions can be summarized as follows:

  1. Following guidelines from current standards (ISO/IEC, 2014) and recent literature (Holm and Afridi, 2015; Roschke et al., 2009), we identify four information categories over which vulnerabilities are described: Assets (Stoneburner et al., 2002), Attack (Roschke et al., 2009), Vulnerability type (ISO/IEC, 2014), and Known threats (Holm and Afridi, 2015).

  2. Building on recent research on the automatic identification of ‘requirement smells’ (Femmer et al., 2016), we evaluate the number of information cues (i.e. phrases consisting of one or more words) associated with each of the identified information categories, and their affect on assessment error.

  3. We ask 67 students to score a set of 16 vulnerabilities using CVSS. To evaluate the effect of different information cues, we devise a block experiment design in which each student is assigned randomly to a treatment222Treatments integrate baseline vulnerability descriptions with information provided by the standard body for CVSS. group, and compare assessment errors to identify which information cues are effective in aiding the final assessment and which are not.

This paper unfolds as follows: Section 2 discusses related work. Section 3 outlines our research goal and questions, experiment setup, metrics, and hypotheses. We then presents our results (Section 4) and discuss their implications (Section 5). Section 6 and 7 discusses threats to validity and conclude.

2. Background and Related Work

In security engineering controlled experiments have been performed to measure the effectiveness and efficiency of vulnerability analysis techniques and applications (Scandariato et al., 2013; Allodi and Massacci, 2014; Allodi et al., 2013), security patterns in helping software designers (Yskout et al., 2015), and the application of different security methods for risk assessment (Labunets et al., 2013).

Similarly, several authors studied the relation between vulnerability measures and risk scenarios. The operative aspects integrating security measures in production environments have been studied, among others, by Dashevskyi et al. (Dashevskyi et al., 2016) (who investigate settings where vulnerabilities are included in third party components), Zhang et al. (Zhang et al., 2013)

(who predict bug fixing times by employing a Markov model based on field data), Zimmermann et al. 

(Zimmermann et al., 2010) (who investigate the discrepancies between user-supplied bug information and information needed by the developers), and Zhao et al. (Zhao et al., 2016) (who evaluate the effect of early discussion on bug fixing). We integrate these findings by focusing on vulnerability information and evaluating which information aids the vulnerability fixing process.

Proposed measures for the identification of vulnerabilities in code rely on features of code such as code complexity and code churn (Shin et al., 2011), whereas other authors propose keyword-based text-mining procedures to forecast vulnerabilities (Walden et al., 2014). Thompson et al. (Thompson et al., 2016) investigated the cognitive effort spent when breaking down software engineering tasks such as bug fixing.

To aid a correct understanding of software requirements, natural language processing techniques such as keyword extraction have been used to detect quality defects in natural language specification 

(Femmer et al., 2016). Experimentation often relates to factors such as the correctness and the positive or negative tone of requirements (Mund et al., 2015), and grammatical features such as passive or active voice requirements. While our approach is similar, we detect information cues in vulnerability description text to associate it with assessment errors, as opposed to measuring ‘bad wording’ in software requirements.

2.1. The Common Vulnerability Scoring System

The CVSS framework specification is the worldwide standard for vulnerability assessment and has been drafted by the dedicated Special Interest Group (SIG). The CVSS

 framework provides a number of dimensions over which a vulnerability is assessed based on available information on the vulnerability. These dimensions are classified into three groups, or metrics: Base Metric (captures technical characteristics of the vulnerability), Temporal Metric (captures vulnerable conditions that change in time), and Environmental Metric (captures conditions that change by deployment environments). The Base Metric Group is by far the most commonly used in practice

(Naaliel et al., 2014; Houmb et al., 2010) and is the one officially used to describe vulnerabilities in the NIST’s National Vulnerability Database (NVD) (NVD, 2015).

The Base Score assessment is organized in two conceptually different groups of sub-metrics 333A third metric group, Scope, is not reported here for brevity as it is not used in this study.; Exploitability metrics reflect the means by which an attacker can deliver a successful attack, whereas Impact metrics provide an assessment of the consequences of a successful attack on the impacted system.

Exploitability metrics under CVSS v3 are measured over four dimensions: Attack Vector (AV), Attack Complexity (AC), Privileges Required (PR) and User Interaction (UI). Impact metrics in CVSS v3 are measured over the triad Confidentiality, Integrity and Availability. Table 1 provides a summary description of the CVSS v3 Base metrics. Full reference can be found at the official specification documentation (, 2015b).

Exploitability metrics
ID Metric Description Values
AV Attack
Reflects how remote the attacker can be, to deliver the attack against the vulnerable component. The more remote, the higher the score. Physical, Local, Adj. Net., Network.
AC Attack
Reflects the existence of conditions that are beyond the attacker’s control for the attack to be successful. High, Low.
PR Privileges Required Reflects the privileges the attacker need have on the vulnerable system to exploit the vulnerable component. High, Low, None.
UI User
Reflects the need for user interaction to deliver a successful attack. Required, None.
Impact metrics
ID Metric Description Values
C Confidentiality Measures the impact to the confidentiality of information. None, Low, High.
I Integrity Measures the impact to the integrity of information. None, Low, High.
A Availability Measures the impact to the availability of the impacted component. None, Low, High.
Table 1. Summary description of CVSS v3 Base metrics

2.2. Information categories for vulnerability measurement

We evaluate the effect of the following information categories on the accuracy of CVSS assessments:

Assets. Security assessment and management standards such as NIST 800-30 and Common Criteria (ISO/IEC, 2012; Stoneburner et al., 2002) define the concept of ‘asset’ as key to correctly evaluate the severity of the vulnerability impact. Information in this category includes details on type of affected system (e.g. a server or a client) or the component affected by the vulnerability (e.g. an operating system or a virtual machine).

Attack. Expert interviews conducted by Holm et al. (Holm and Afridi, 2015), alongside other studies (Fruhwirth and Mannisto, 2009), identify information regarding attack procedures as important to conduct an accurate vulnerability assessment. Attack procedures describe the actions that an attacker must perform to exploit the vulnerability: for example, the attacker may need to launch a man-in-the-middle attack, or inject code in a webpage.

Vulnerability type. ISO 29147 (ISO/IEC, 2014) conceptualizes vulnerability information as related to a description of the vulnerability and its impact. This includes information on the type of vulnerability and its causes in the program’s code. For example, an erroneous bound checking of a memory array may lead to memory corruption vulnerabilities; similarly, erroneous input validation on a web form may lead to cross-site-scripting (XSS) vulnerabilities.

Known threat. Several studies (Barnum and McGraw, 2005; Roschke et al., 2009; Holm and Afridi, 2015) suggest that information on existing threats should also be provided to aid a better vulnerability assessment. This information includes details on the existence of proof-of-concept exploit (PoC), active exploitation in the wild, or incidents linked to the specific vulnerability.

3. Methodology

In this paper, following the discussion in Sec. 2.2, we investigate the following research question: How does information on {Asset, Attack, Vuln. type, Known threat} impact assessment errors?

3.1. Experimental settings

To address these four research questions, we perform an experiment where subjects are asked to score sixteen vulnerabilities using CVSS. Each vulnerability is associated with its description from the National Vulnerability Database (NVD)(NVD, 2015) and a treatment consisting of additional information on the vulnerability (on top of its baseline NVD description) provided by the CVSS SIG (, 2015a). Table 2

Example of four CVE descriptions and treatments assigned to students. We obtained this descriptions from the NVD. Treatment are obtained from the official CVSSv3 example guide (, 2015a). The column ‘Treatment effect’ outlines the effect on error rate of the treatment. Indicated p-values are Holm-corrected for multiple comparisons over CVSS metrics. We highlighted in bold relevant excerpts that explain the treatment effect. Significance of the treatment effect is evaluated with a Wilcoxon rank-sum test.
CVE NVD Description Treatment Treatment effect

The SSL protocol 3.0, as used in OpenSSL through 1.0.1i and other products, uses nondeterministic CBC padding, which makes it easier for man in the middle attackers to obtain plaintext data via a padding- oracle attack, aka the ”POODLE” issue.

A typical treatment is that a victim has visited a web server and her web browser now contains a cookie that an attacke wishes to steal. For a successful attack, the attacker must be able to modify network traffic between the victim and this web server, and both victim and system must be willing to use SSL 3.0 for encryption. Decrease error on AC () Decrease error on UI ()
CVE-2012-0384 Cisco IOS 12.2 through 12.4 and [..] before 3.2.2SG, when AAA authorization is enabled, allow remote authenticated users to bypass intended access restrictions and execute commands via a (1) HTTP or (2) HTTPS session, aka Bug ID CSCtr91106. This vulnerability is post authentication on the administrative interface of the Cisco device. Therefore to attack a typical installation, the attacker would need access to the trusted / internal side of the IOS. Increase error on PR ()
Table 2. Example of vulnerability descriptions and treatments given to the students.

reports example vulnerability descriptions and treatments used for the experiment. The column ‘Treatment effect’ reports the effect of the treatment on the accuracy of the assessment, which is discussed in detail in Section 4.

Subjects were given 90 minutes to complete the assessment irrespective of the treatment selection. Hence, each subject had on average about 6 minutes per vulnerability. In accordance with literature on the subject (Pennington and Tuttle, 2007), the time was selected on the basis of previous trial experiments previously conducted in similar settings.

3.2. Vulnerabilities and Subject Selection

The sixteen vulnerabilities employed in the experiment are obtained from the official CVSS v3 Example document drafted by the SIG for CVSS (, 2015a). The vulnerabilities included by the SIG have been chosen to represent the full set of CVSS metrics, and are actively used for training purposes by members of the SIG consortium within the respective organizations. Each vulnerability in the document is associated with its official public description from the National Vulnerability Database (NVD, 2015) and additional information added by the CVSS SIG. The subjects of this study are 67 students enrolled in the software engineering study program, who registered for a software security course.

3.3. Measures

Information cues

To quantify the amount of information in a vulnerability description (for each information category identified in Sec. 2.2: Asset, Attack, Vulnerability type, Known threat) we employ a methodology originally developed to automatically identify ‘smells’ in software requirement specifications (Femmer et al., 2016). The original methodology employs keyword-matching to identify standard-defined criteria for quality of requirements in the analysed text. As no such standard exists for software vulnerability descriptions, in our study we identify keywords relevant to each of the identified information categories by manually analysing over 100 randomly sampled vulnerabilities from NVD. Keywords are selected as indicators of what information is present in the description. For example, the keyword ‘remote attacker’ indicates that the vulnerability description explicitly reports information relative to the information category Attacker. Information cues are measured as the number of keyword matches in a baseline vulnerability description and in the corresponding treatment. Table 3 reports a sample of the keywords identified for each information category. The full keyword list is available in the online appendix.444

Information categories Definition Reference Keywords
Asset Assets are entities that users or vendors value and contain vulnerabilities. (, 2015b; Stoneburner et al., 2002; ISO/IEC, 2005) hardware, guest virtual machine, host, vm, device, client, server, operating system, version, product, affected version, affected product, vulnerable, vulnerable software, vulnerable hardware, affected software, affected hardware, software
Attack Actions and entities that can adversely act on assets by exploiting vulnerabilities. (Holm and Afridi, 2015; Roschke et al., 2009; Barnum and McGraw, 2005) attacker, malicious user, remote authenticated user, remote user, man in the middle, unauthenticated remote attacker, spoofing, inject code, manipulate pointers, cache poisoning, open malicious file, birthday attack
Vulnerability type Describes the technical flaws that can be exploited and the impact of the exploitation. (ISO/IEC, 2014) improper bounds checking, insufficient randomness, memory corruption, buffer overflow, cross-site scripting, broken authentication, insecure cryptographic storage, failure to restrict URL access, cross-site request forgery (CSRF)
Known threat Describe known threats that can exploit the vulnerability (Holm and Afridi, 2015; Barnum and McGraw, 2005) known threats, threat, known attacks, information about known threats, exploit, proof-of-concept, incident activity, incident, known incident
Table 3. Definitions of information categories and selection of respective keywords.

Assessment errors

To evaluate assessment errors, we compare the subjects’ CVSS assessments on the vulnerabilities with those performed by the CVSS SIG. In this study we do not consider magnitude or directionality of error, but only the presence of a correct () or wrong () assessment for each CVSS metric (AV, AC, UI, PR, C, I, A), for all vulnerabilities.

Subject characteristics

Each subject was asked to complete a background questionnaire. We collected data relative to: security expertise of the student; software engineering expertise; years of prior work experience; years of enrollment in a Computer Science major; university courses completed. Students where asked to perform both a self-assessment on their expertise and to answer a set of multiple-choice technical questions on relevant areas of software security and engineering. Each technical question has only one correct answer. The questionnaire is available in the online appendix. Results are discussed in Section 4.1.

3.4. Hypotheses


Because Asset provides information regarding the target of the attack (e.g. a browser, or a server) we expect this information category to reduce error assessments on the impact metrics C, I and A. For example, an attack on a browser may violate the Confidentiality of information stored in cookies or browsing history, whereas an attack on a server may affect the service Availability. We formulate the following null hypothesis:

Hypothesis 1 ().

: The Asset information category does not reduce error rates for the C, I, A metrics.


Information on Attack adds details on the actions that the attacker has to perform to exploit the vulnerability. Therefore, we expect this information category to reduce assessment error for the AV metric (position of the attacker with respect to the vulnerable component), and the AC metric (reflecting conditions outside of the attacker control). Additionally, indications on the attacker actions may give significant indications for the impact of the vulnerability. For example, performing a cache poisoning attack555A cache poisoning attack requires the attacker to modify some cached record (e.g. a DNS response) such that at the next request the victim will receive the counterfeit information added by the attacker. This may lead to spoofing attacks with possible losses on at least Confidentiality and Integrity. has clear repercussions on C and I. Denial of service attacks may indicate losses on A. We formulate the following null hypothesis:

Hypothesis 2 ().

: The Attack information category does not reduce error rates for the AV, AC, C, I, A metrics.

Vulnerability type

Information on Vulnerability type provides information on the complexity of an attack, e.g. by specifying that the vulnerability is due to insufficient randomness in a specific variable. Information regarding specific vulnerability types (e.g. cross-site-scripting vulnerabilities) and required authentication levels give information on PR and UI. We formulate the following null hypothesis:

Hypothesis 3 ().

: The Vulnerability type information category does not reduce error rates for the AC, PR, UI metrics.

Known threat

From the CVSS specification, the Base Metric should only consider information relative to the technical characteristics of the vulnerability. Specifically, Known threat information may be relevant in subsequent assessments to evaluate risk of attack (e.g. involving the CVSS temporal metrics (, 2015b)), but may confuse the baseline assessment of the vulnerability. For example, Known threat information may increase error on AC as the existence of known threats may suggest that the vulnerability can be easily exploited, e.g. building up on the existing PoC. Similarly, information on known attacks may influence impact assessments to reflect those of the known incidents. Therefore, we expect Known threat to be generally detrimental to the assessment of AC and C,I,A. We formulate the following null hypothesis:

Hypothesis 4 ().

: The Known threat information category does not increase error rates for the AC, C, I, A metrics.

3.5. Experimental procedure

Before the experiment subjects were given a lecture on vulnerability assessment with CVSS. The lecture covered all aspects of the standard required for the experiment. With the objective of increasing subject’s confidence in the procedure, a demo session scoring five vulnerabilities from the CVSS documentation (not included in the experiment) was performed during the lecture.

Subjects were given a handout with the official CVSS specification, and a printout spreadsheet containing the sixteen vulnerability descriptions. Subjects were randomly assigned to a treatment group and received additional information on each vulnerability together with the NVD description. Subjects had to 1) complete the questionnaire described in Sec. 3.3; 2) read each vulnerability description; 3) indicate which value for each of the CVSS metrics in Tab. 1 better reflect the vulnerability description.

3.6. Analysis procedure

To test our hypotheses we employ a set of multilevel mixed effect regression models of the form: , where reflects the presence or absence () of an assessment error on the metric by student , over vulnerability ; is the control vector of subject characteristics, and is the vector of information cues for each category measured on vulnerability . The remaining terms account for random effects for the first level in the hierarchy, students (); and the second, vulnerability (). Each hypothesis is evaluated in accordance with the respective coefficient sign and its significance.

4. Results

4.1. Overview of subjects

Before executing the experiment, we asked students to fill out a questionnaire that provides an overview of their background (twenty multiple-choice questions) and relevant security and software engineering expertise (six self-assessment questions and six technical questions). All questions where divided in security and software engineering

questions. From the 67 participants, 14 were Bachelor students, the rest were Master students. 36% of the participants have part-time work occupations. Looking at the techninal security questions, the mean score was 0.57, with 1 indicating all correct answers and 0 no correct answer. The standard deviation is relatively small at 0.27 points. Similar scores are identified for the software engineering technical questions.

4.2. Illustrative analysis example

Table 2 reports two example vulnerability descriptions for which treatments have a significant effect on assessment errors for at least one CVSS metric. We report one vulnerabilities where we observe negative effects on the error (CVE-2014-3566), and one where we observe positive effects (CVE-2012-0384). Information that explains the difference is highlighted in bold in the Table. In the following, the correct metric assessment is reported next to the CVE vulnerability identifier:

CVE-2014-3566 (AC:High, UI:Required). Students that received the treatment were less likely to err at identifying: (a) conditions outside of the attacker control (), as the treatment specifies that “[the] attacker must be able to modify network traffic between the victim and this web server”, suggesting a man-in-the-middle condition; (b) the requirement on UI, specifying that the attack is possible only after “[the victim] has visited a web server” ().

CVE-2012-0384 (PR:Low). The treatment significantly increases chances of error over PR (). The treatment states that to trigger the vulnerability “the attacker would need access to the trusted / internal side of the IOS.”. Any user authenticated in the network would be able to access the interface (i.e. only non-privileged authentication to the network is required). However, the additional information that “the vulnerability is post authentication on the administrative interface of the Cisco device”, can be misleading in that the attacker does not need to be logged in the administrative panel, but only capable of reaching it from the network (in which he/she must be authenticated).

In our examples, additional information could either aid or hinder a correct assessment by, for example, misleading wording of relevant information (e.g.CVE-2012-0384): in accordance with previous findings in sw engineering (Eppler and Mengis, 2004; Pennington and Tuttle, 2007), both quantity and quality of information may affect task execution. Unfortunately, neither can be a realistic requirement for an informative vulnerability description as they do not provide a clear guidance on which information cues should be provided.

4.3. Tratment effect on assessment error

To identify the effect of the measured information cues we employ a set of mixed-effect regression analyses. For the model selection we relied on the Akaike Information Criterion666Considered control variables: security expertise of the student; software engineering expertise; work experience; years of enrollment in a Computer Science major university courses completed.. The only significant student characteristic is security expertise (). Correlation between the independent variables is always below 0.2.

We first check for the possible correlation between length of vulnerability description (expressed as word counts) and error rates, and find that neither the length of the original NVD description nor the length of the treatment text have significant effects on the observed error. We therefore proceed with the analysis of the effect of the information cues.

For our final regression, the regressors are count of information cues measured in the original NVD description and those added by the assigned treatments

. All variables are standardized. The final regression equation over the binomial response variable representing assessment error


Results are reported in Table 4.

Regression results for our equations. p-values for the fixed effects are computed by using Satterthwate’s estimation for degrees of freedom as provided by the R package


. Standard errors are indicated in parenthesis. Regression coefficients are reported for the information cues all students received (as provided in the original NVD description of the vulnerability) and for the additional information cues included in the treatment. All variables are standardized. The original NVD descriptions do not have any information regarding

Known threats

, which is therefore only relevant for the provided treatments. An anova test of variance indicates that the intercepts for students and CVEs significantly vary between subjects and vulnerabilities.

model AV model AC model UI model PR model C model I model A
Fixed effects
(Intercept) -0.804 -0.667 -2.035 -1.008 0.504 0.088 -0.511
(0.203) (0.200) (0.376) (0.331) (0.224) (0.246) (0.312)
-0.088 -0.007 -0.233 -0.137 -0.190 -0.246 -0.080
(0.081) (0.098) (0.110) (0.128) (0.100) (0.093) (0.095)
Information cues from original description
Assets 0.384 0.097 -0.550 0.246 -0.035 0.314 -0.696
(0.198) (0.192) (0.383) (0.328) (0.221) (0.232) (0.320)
Attack -0.555 0.062 0.111 -0.209 -0.013 -0.034 0.388
(0.211) (0.196) (0.386) (0.333) (0.228) (0.238) (0.327)
Vulnerability type 0.085 -0.330 -0.362 -0.530 0.169 0.032 -0.150
(0.149) (0.135) (0.228) (0.166) (0.143) (0.139) (0.163)
Additional information cues from treatment
Assets -0.191 0.169 -0.282 -0.054 0.121 -0.069 -0.100
(0.129) (0.123) (0.175) (0.131) (0.113) (0.125) (0.108)
Attack -0.036 -0.280 -0.238 -0.031 -0.313 -0.202 -0.183
(0.116) (0.127) (0.183) (0.139) (0.112) (0.114) (0.120)
Vulnerability type -0.108 -0.067 -0.563 -0.103 -0.015 -0.072 -0.052
(0.098) (0.116) (0.171) (0.125) (0.100) (0.100) (0.112)
Known threats 0.176 0.528 0.494 -0.206 0.325 0.017 0.278
(0.128) (0.138) (0.177) (0.138) (0.121) (0.127) (0.131)
Variance of random intercepts
Student 0.045 0.263 0.101 0.637 0.314 0.210 0.200
CVE 0.464 0.436 1.821 1.473 0.636 0.706 1.400
Pseudo- (Fixed effect) 0.09 0.09 0.14 0.09 0.03 0.05 0.09
Pseudo- (Fixed and random eff.) 0.22 0.25 0.46 0.44 0.25 0.26 0.39
Signif. codes: ‘***’ 0.001; ‘**’ 0.01; ‘*’ 0.05; ‘.’ 0.1.
Table 4. Regression results

A negative, significant coefficient indicates a decrease in the chances of error. Positive, significant coefficients indicate an increase in chances of error. Security expertise tends to reduce error although it is not a significant factor for all metrics. Overall, we find consistent estimations for each information category. In general, information cues on Attack and Vulnerability type aid the scoring for all metrics. Asset creates mixed results, whereas Known threat is always counter-productive. The fixed effects account for about 10% of the overall variance in the model across all metrics, with only a few exceptions in either direction (14% for UI, 3% for C). The inclusion of the random effects accounts for in between 22% and 46% of the variance, indicating a good overall fit.

RQ1: How does information category Asset impact assessment errors?

Error rates on A are negatively impacted by this information category; for example, if the vulnerable asset is a server, service availability can be likely compromised by an attack. Additionally, we find that information on Assets increase the error on the AV metric, albeit the effect is only weakly significant. Some assets (e.g. a router or a server) may be correlated with AV:Network

assessments, whereas in specific cases the attacker may need be locally authenticated on the asset. We provide two examples of this from our experiment in the next Section. We reject the null hypothesis of Hyp. 

1 for A and accept the alternative that there is a decrease in error. We do not reject the null for C,I.

RQ2: How does information category Attack impact assessment errors?

This information category improves accuracy on AV, as it can clearly indicate the position of the attacker. Similarly, we find a negative effect on error for AC. For example, a man-in-the-middle attack suggests a high condition for this metric (, 2015b). For the CVSS impact triad CIA, information regarding the attack decreases error for Confidentiality and Integrity. For example, a cache poisoning attack implies an impact on the integrity of the cached information. We reject the null hypothesis of Hyp. 2 for AV,AC,C,I and accept the alternative that there is a decrease in error. We do not reject the null for A.

RQ3: How does information category Vulnerability type impact assessment errors?

For AC, information on the type of vulnerability favours assessment accuracy. For example, specifying that the vulnerability is caused by insufficient randomness (e.g. of a hash function) may indicate that the attacker will typically have to find a collision before actively exploiting the vulnerability. Vulnerability type also reduces chances of error on UI. For example, a cross-site-scripting vulnerability typically requires the user to click on a malicious link. The effect on PR is similar: this information cue may clarify whether some level of privilege is required to launch the attack. For example, privilege escalation vulnerabilities typically require some level of authentication. We reject the null hypothesis of Hyp. 3 for AC,PR,UI, and accept the alternative that there is a decrease in error.

RQ4: How does information category Threat impact assessment errors?

In general, we find that this information cue increases the chances of error. From the CVSS specification, the Base Metric should only consider information relative to the technical characteristics of the vulnerability. Hence, the existence of an exploit or of a demonstrated attack is unnecessary information that need be processed by the assessor. For example, information on the existence of a demonstrated attack may increase the error on AC, as previously discussed (cf. 3.4). Similar considerations can be made for the other metrics. We reject the null hypothesis of Hyp. 4 for AC,C,A, and accept the alternative that there is an increase in error. We do not reject the null for I.

Hyp. Inf. Cue
Reject No reject
Hyp. 1 Assets A C,I Err. Decrease
Hyp. 2 Attack AV,AC,C,I A Err. Decrease
Hyp. 3 Vuln. Type AC,PR,UI - Err. Decrease
Hyp. 4 Known threats AC,C,A I Err. Increase

5. Discussion

Our results indicate that ‘baseline’ vulnerability descriptions can be significantly improved by including additional information. Information of type Attack and Vulnerability type are particularly effective in increasing the accuracy of vulnerability assessment by reducing error on the whole set of Exploitability metrics (cf. Table 1) AV, AC, UI, PR.

In our sample, additional information on attacker actions significantly decrease error on AC, indicating that additional information on Attack was missing from the original text. Similarly, the Attack information added by our treatments also significantly decrease the error for C and I. Our results suggest that security expertise helps interpreting this information (e.g. a ‘cache poisoning’ attack).

The information on Vulnerability type conveyed by standard vulnerability descriptions seem not to be significantly improved by our treatment for AC and PR, whereas there is a highly significant improvement in assessment accuracy for UI. Information regarding the type of vulnerability such as a file-based buffer overflow or a cross-site-scripting vulnerability should be included in vulnerability descriptions. This is again in accordance with the negative significant coefficient for , indicating that security expertise is significant in correctly understanding the type of vulnerability.

An interesting finding is that Asset contribute in increasing error on AV. Certain information on Asset may correlate with certain AV values; for example, if the vulnerable asset is a server, AV:Network assessments may be more likely. For example, of the students erroneously classified CVE-2014-6271 as AV:Local, likely as the vulnerability is specified, in the original NVD description, to affect “GNU Bash through 4.3 [..] [in] situations in which setting the environment occurs across a privilege boundary from Bash execution”. Here the vulnerable asset is clearly the GNU Bash, which may suggest that the user need be authenticated locally to reach the vulnerability. However, in the worst case this is possible without any local access to the environment, as specified in the description: “the vulnerability can be exploited by [..] mod_cgid modules in the Apache HTTP Server, scripts executed by unspecified DHCP clients [..]”, which indicates a Network vector for the attack. Similarly, in our sample, 89% of the students erroneously categorized CVE-2012-1516 as AV:Local. The description reports that “it is possible to manipulate data pointers within the Virtual Machine Executable (VMX) process”, which suggests that the user need be locally authenticated on the machine to access the process. This is not a condition for AV:Local as the vulnerability can be reached by the “handler function for RPC commands” (, 2015a), a procedure to send remote commands to a process. Both examples suggest that a more precise definition of Attacker actions may contribute in decreasing the effect.

Finally, information on Known threats is regarded by security experts as of primary importance to assess vulnerability risk (Holm and Afridi, 2015). However, we find first evidence that it consistently increases chances of error, as the CVSS Base metric should only consider technical details on the vulnerability.

Following Devanbu et al.’s recommendations on the impact of empirical findings on software practices (Devanbu et al., 2016), we further discuss practical implications of this work.

Implications for vulnerability communication. Our results suggest that baseline vulnerability descriptions contain only a limited set of the information that leads to an accurate CVSS assessment. Additional information on Attack, Vulnerability type, and Assets may result in more informative vulnerability descriptions. Following our results, standards and best practices for vulnerability communication, including CVSS itself, may provide guidelines for the communication of informative vulnerability descriptions (ISO/IEC, 2014). Our results suggest that inclusion of information of the Threat category should be discouraged. Further, our results identify dimensions over which vulnerability information can be automatically categorized and provided to vulnerability assessors.

Implications for software security practices. Our findings can help practitioners in identifying information that is significant for a vulnerability assessment over each specific metric (Pennington and Tuttle, 2007). For example, the assessor performing an evaluation of the AV metric may look specifically for Attack information. Similar considerations can be made for the other metrics (cf. Table 4). Additionally, the assessor should deliberately ignore any information on Known threats

, if present. By replicating this work, it could be possible to build ‘confidence intervals’ around vulnerability assessment that account for errors in the estimate. These intervals could then be accounted for when prioritizing vulnerability fixing.

6. Threats to validity

Conclusion validity. To avoid introducing noise in the vulnerability descriptions and treatments used in the experiment, all descriptions and treatments have been chosen from the official documentation released by the the standardisation team, used for official training for the standard.

Internal validity. Results may be confounded by order of treatment or learning effects. As we can not cover all treatment combinations, our experiment design is not full-factorial. However, we accounted for all combinations of treatments for similar vulnerabilities that might confound results. The identification of our keywords for the measurements of information cues in the vulnerability descriptions was performed independently by three authors of the paper. To minimize chances of bias, the experiment was performed before the final example documentation was publicly released.

External validity. Following (Höst et al., 2000) we consider students suitable subjects for relative performance measures. Students were informed that the exercise is not graded. All received the same training on the CVSS scoring system at the beginning of our experiment. We controlled for potentially relevant characteristics of our subjects, including security expertise and work experience.

7. Conclusions

In this paper we investigate which information cues aid vulnerability assessment by humans. We based our definition of relevant information on current standards and best practices (ISO/IEC, 2014; Stoneburner et al., 2002), and recent research findings by other authors (Holm and Afridi, 2015; Roschke et al., 2009). Our results provide first indication that, in general, additional information cues on Asset, Attack, and Vulnerability type on top of the baseline vulnerability descriptions may aid the assessment process, whereas information cues on Threat hinders it.

An interesting venue for future research is to explicitly consider the effect of information security knowledge by devising experiments with security professionals. Additionally, this work opens toward research considering measures of complexity to evaluate whether there exist boundaries over which the cognitive performance of the assessor decays.

8. Acknowledgments

This project has been partly supported by the University of Trento, Italy, through the EIT Digital Master School 2016 - Security and Privacy programme, and by the Sponsor NWO Rl through the SpySpot project no. Grant #3.


  • (1)
  • PCI (2010) 2010. PCI Council PCI DSS Requirements and Security Assessment Procedures, Version 2.0. (2010).
  • NVD (2015) 2015. NIST National Vulnerability Database (NVD). (2015). [online]
  • Allodi and Massacci (2014) Luca Allodi and Fabio Massacci. 2014. Comparing vulnerability severity and exploits using case-control studies. ACM Transaction on Information and System Security (TISSEC) 17, 1 (August 2014).
  • Allodi et al. (2013) Luca Allodi, Shim Woohyun, and Fabio Massacci. 2013. Quantitative assessment of risk reduction with cybercrime black market monitoring.. In In Proc. of IWCC’13.
  • Barnum and McGraw (2005) Sean Barnum and Gary McGraw. 2005. Knowledge for software security. IEEE Security & Privacy 3, 2 (2005), 74–78.
  • Christey and Martin (2013) Steve Christey and Brian Martin. 2013. Buying into the bias: why vulnerability statistics suck. (July 2013).
  • Dashevskyi et al. (2016) Stanislav Dashevskyi, Achim D Brucker, and Fabio Massacci. 2016. On the Security Cost of Using a Free and Open Source Component in a Proprietary Product. In International Symposium on Engineering Secure Software and Systems. Springer, 190–206.
  • Devanbu et al. (2016) Prem Devanbu, Thomas Zimmermann, and Christian Bird. 2016. Belief & Evidence in Empirical Software Engineering. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 108–119.
  • Eppler and Mengis (2004) Martin J Eppler and Jeanne Mengis. 2004. The concept of information overload: A review of literature from organization science, accounting, marketing, MIS, and related disciplines. The information society 20, 5 (2004), 325–344.
  • Femmer et al. (2016) Henning Femmer, Daniel Méndez Fernández, Stefan Wagner, and Sebastian Eder. 2016. Rapid quality assurance with Requirements Smells. Journal of Systems and Software (2016).
  • (2015a) 2015a. Common Vulnerability Scoring System v3.0: Example Document. Technical Report. FIRST, Available at
  • (2015b) 2015b. Common Vulnerability Scoring System v3.0: Specification Document. Technical Report. FIRST, Available at
  • Fruhwirth and Mannisto (2009) Christian Fruhwirth and Tomi Mannisto. 2009. Improving CVSS-based vulnerability prioritization and response with context information. In Proceedings of the 2009 3rd international Symposium on Empirical Software Engineering and Measurement. IEEE Computer Society, 535–544.
  • Holm and Afridi (2015) Hannes Holm and Khalid Khan Afridi. 2015. An expert-based investigation of the Common Vulnerability Scoring System. Computers & Security 53 (2015), 18–30.
  • Höst et al. (2000) Martin Höst, Björn Regnell, and Claes Wohlin. 2000. Using students as subjects—a comparative study of students and professionals in lead-time impact assessment. Empirical Software Engineering 5, 3 (2000), 201–214.
  • Houmb et al. (2010) Siv Hilde Houmb, Virginia NL Franqueira, and Erlend A Engum. 2010. Quantifying security risk level from CVSS estimates of frequency and impact. 83, 9 (2010), 1622–1634.
  • Howard and Lipner (2006) Michael Howard and Steve Lipner. 2006. The Security Development Lifecycle. Microsoft Press.
  • ISO/IEC (2005) ISO/IEC. 2005. Information technology - Security techniques - Information security management systems - Requirements. ISO/IEC 27001. International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC).
  • ISO/IEC (2012) ISO/IEC. 2012. Common Criteria for Information Technology Security Evaluation. ISO/IEC 15408. International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC).
  • ISO/IEC (2014) ISO/IEC. 2014. Information technology - Security techniques - Vulnerability disclosure. ISO/IEC 29147. International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC).
  • Labunets et al. (2013) Katsiaryna Labunets, Fabio Massacci, Federica Paci, et al. 2013. An experimental comparison of two risk-based security methods. In 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 163–172.
  • Mund et al. (2015) Jakob Mund, Henning Femmer, M Daniel, and Jonas Eckhardt. 2015. Does Quality of Requirements Specifications matter ? Combined Results of Two Empirical Studies. In Proc. of the 9th International Symposium on Empirical Software Engineering and Measurement (ESEM ’15).
  • Naaliel et al. (2014) Mendes Naaliel, Duraes Joao, and Madeira Henrique. 2014. Security Benchmarks for Web Serving Systems. In Proc. of ISSRE’14.
  • Pennington and Tuttle (2007) Robin Pennington and Brad Tuttle. 2007. The effects of information overload on software project risk assessment. Decision Sciences 38, 3 (2007), 489–526.
  • Quinn et al. (2010) Stephen D. Quinn, Karen A. Scarfone, Matthew Barrett, and Christopher S. Johnson. 2010. SP 800-117. Guide to Adopting and Using the Security Content Automation Protocol (SCAP) Version 1.0. Technical Report. NIST.
  • Roschke et al. (2009) Sebastian Roschke, Feng Cheng, Robert Schuppenies, and Christoph Meinel. 2009. Towards Unifying Vulnerability Information for Attack Graph Construction. Springer Berlin Heidelberg, Berlin, Heidelberg, 218–233.
  • Scandariato et al. (2013) R. Scandariato, J. Walden, and W. Joosen. 2013. Static analysis versus penetration testing: A controlled experiment. In Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on. 451–460.
  • Shin et al. (2011) Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason A. Osborne. 2011. Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities. 37, 6 (2011), 772–787.
  • Stoneburner et al. (2002) Gary Stoneburner, Alice Y. Goguen, and Alexis Feringa. 2002. SP 800-30. Risk Management Guide for Information Technology Systems. Technical Report. Gaithersburg, MD, United States.
  • Thompson et al. (2016) C. Albert Thompson, Gail C. Murphy, Marc Palyart, and Marko Gašparič. 2016. How Software Developers Use Work Breakdown Relationships in Issue Repositories. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR ’16). ACM, New York, NY, USA, 281–285.
  • Walden et al. (2014) James Walden, Jeff Stuckman, and Riccardo Scandariato. 2014. Predicting vulnerable components: Software metrics vs text mining. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. IEEE, 23–33.
  • Yskout et al. (2015) Koen Yskout, Riccardo Scandariato, and Wouter Joosen. 2015. Do Security Patterns Really Help Designers?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, 292–302.
  • Zhang et al. (2013) Hongyu Zhang, Liang Gong, and Steve Versteeg. 2013. Predicting bug-fixing time: an empirical study of commercial software projects. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 1042–1051.
  • Zhao et al. (2016) Yu Zhao, Feng Zhang, Emad Shihab, Ying Zou, and Ahmed E Hassan. 2016. How Are Discussions Associated with Bug Reworking?: An Empirical Study on Open Source Projects. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 21.
  • Zimmermann et al. (2010) Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian Schroter, and Cathrin Weiss. 2010. What makes a good bug report? IEEE Transactions on Software Engineering 36, 5 (2010), 618–643.