Most modern software uses third-party open source libraries, packages, or frameworks that are referred to as dependencies. A Black Duck report found 98% of the 1,546 audited commercial codebases in 2020 contained open source packages with an average of 528 packages per codebase (Synopsys, 2021). However, known vulnerabilities in dependencies are one of the top ten security risks (OWASP, 2020). The Black Duck audit also found 84% of the codebases to contain at least one publicly known vulnerability in their open source dependencies (Synopsys, 2021).
Software composition analysis (SCA) tools are used to report known vulnerabilities in the open source dependencies of a software. However, these tools may differ in how they detect the dependencies and the vulnerability database they maintain. A comparative study is yet to be performed to review the existing SCA tools and determine if and how they differ. Furthermore, not all alerts generated by SCA tools are relevant or high priority to the developers (Pashchenko et al., ). If and how existing SCA tools aid developers in assessing the risk of the vulnerabilities from the context of the client application needs to be studied to help future research.
The goal of this study is to aid security practitioners and researchers in understanding the vulnerability reporting by software composition analysis tools through a comparative study of these tools on a real-world case study. Our research questions are:
RQ1: What are the differences between vulnerability reports produced by the different software composition analysis (SCA) tools?
RQ2: What metrics are presented by the SCA tools to aid in the risk assessment of dependency vulnerabilities?
The remainder of the paper is structured as follows: Section 2 introduces the key concepts and terminologies; Section 3 and 4 explains the evaluation case study and the studied SCA tools. Section 5 discusses the findings of this paper, followed by discussion and limitations of the findings. Section 8 discusses related work, followed by conclusion.
2. Key Concepts & Terminologies:
Dependency: When a software uses an open source package, the package is referred to as dependency of the software. Typically, a software declares a specific or a range of valid versions of a package as its dependency in a manifest file that we refer to as dependency file. However, a software may use open source package or code fragments without explicit declaration as well (Haddad, 2020). In the remainder of this paper, we refer to ‘dependency’ as a specific version of a package. For example, version 1.0.0 and version 2.0.0 of the same package A will be considered as distinct dependencies. However, they will be considered as the same package.
The dependencies declared through dependency files are resolved through some package manager. pom.xml and package.json are dependency files for Maven and npm package manager, respectively. The dependencies that a software accesses directly from its own code are called direct dependencies. However, the direct dependencies may depend on other open source packages that are required by the host machine to run the software successfully. Such packages are called transitive dependencies. Therefore, for most package managers, including Maven and npm, the whole dependency structure is hierarchical and forms a tree format. The depth of a dependency refers to their level in the dependency tree, with direct dependencies having a depth of one.
Vulnerability: NIST (of Standards and (NIST), September 2012) defines vulnerability as ”weakness in an information system, system security procedures, internal controls, or implementation that could be exploited or triggered by a threat source.” If a vulnerability gets exploited by a threat source, the potential for loss or damage is referred to as risk of the vulnerability. Vulnerabilities can get discovered in already released versions of software packages. If reported, respective package maintainers can fix the vulnerability in a new version. When a dependency of a software is subject to publicly known vulnerabilities, it is referred to as a vulnerable dependency.
Software Composition Analysis (SCA): SCA is a part of application analysis that deals with managing open source use. SCA tools typically generate an inventory of all the open source components in a software product and analyze the license compliance and the presence of any known vulnerabilities in them. By the vulnerability detection capability of SCA tools, we mean the ability to identify and report known vulnerabilities in the open source components used by a software application.
Disclosed and Discovered Vulnerabilities: The National Vulnerability Database (NVD) (23) is the U.S. government repository of publicly accessible standards-based vulnerability management data. The primary reference-tracking system for publicly disclosed vulnerabilities in the NVD database is the Common Vulnerabilities and Exposure (CVE) system where each vulnerability is referenced by a unique CVE identifier, a system developed by Mitre (22).
Additionally, SCA tools augment NVD vulnerabilities/CVEs with vulnerabilities found in other databases, such as npm Security Advisories (24), Sonatype OSS Index (41), and GitHub security advisories (13), that do not necessarily have a CVE identifier. Similarly, SCA tools can also have proprietary techniques to discover vulnerabilities in open source packages (Catabi-Kalman, ; Zhou and Sharma, 2017) as explained in Section 4.1. In this paper, vulnerabilities reported by SCA tools that do not have an associated CVE identifier are referred to as Non-CVEs.
Maven is a package manager for Java projects.
Dependency Scopes: Maven dependencies can have six different scopes (Project, ): compile, provided, runtime, test, system, and import. The scopes determine the phase when a dependency will be used and if the dependencies can propagate transitively.
Dependency Mediation: When there are multiple versions of a package in the dependency tree, Maven picks one with the nearest definition. Therefore, usually, a single project has a single version of a package as a dependency that is read from a local repository. In the dependency file (pom.xml), developers generally specify a single version for its dependencies. Version numbers can have up to five parts indicating major, minor, or incremental changes.
2.2. Node Package Manager (npm):
Dependency Scopes: npm has two primary dependency scopes: Prod (production) and dev (development) to indicate the phase where a dependency is required.
Dependency Mediation: npm copies all the dependencies in a project sub-directory called ‘node_modules’, with a similar structure of the dependency tree. If two dependencies A and B both depend on the same package C, two different copies of package C will be copied inside package A and B. Therefore, the same dependency can have multiple paths to be introduced to the root application. Also, the same package can have multiple versions as dependencies. Therefore, npm has a concept called dependency path, which is not present in Maven. Each unique path a dependency is introduced to the root application is referred to as the dependency path. In npm, developers can list a range of versions for a package that is valid as a dependency. npm also has the concept of lock files – a snapshot of the entire dependency tree and their resolved version at a given time; and can be used to instruct npm to install the specified versions in the lock file. npm packages follow the SemVer format (2) for version numbering.
3. Evaluation Case Study: OpenMRS
OpenMRS is a web application for electronic medical record platform (27). A particular configuration of OpenMRS that can be installed and upgraded as a unit is referred to as a distribution. The general purpose distribution of OpenMRS is the “Reference Application Distribution” (28). We choose Version 2.10.0 of this distribution released on April 6, 2020 (the latest release at the time of this study ) as our evaluation subject. In the remainder of the paper, we refer to the whole distribution simply as “OpenMRS”.
OpenMRS consists of 44 projects that are hosted in their own separate repositories on GitHub. Out of the 44 projects, 39 are Maven projects and 1 is a npm project. The other 4 projects are composed of a Maven and a npm project each. Based on OpenMRS structure, we scope our study to Maven and npm dependencies. We use OpenMRS SDK (29) to automate the build, test, and run of the individual projects and assemble the full application in this study.
3.1. Why OpenMRS?
Choosing test cases to evaluate software security tools can be a complex task. For comparison of security tools, Delaitre et al. (Delaitre et al., 2018) notes that the test case should have sufficient and diverse number of security weaknesses. OpenMRS depends on many third-party dependencies as will be seen in Section 3.2
; and being a web application, is composed of several heterogeneous components, such as database, content generation engines, client-side code etc., therefore increasing the probability of having a large, diverse set of vulnerable dependencies.
Another approach of comparison instead of a single case study can be running the tools on a group of diverse projects. However, three of the selected tools in this study (Steady, Commercial A, and B) are (a) resource and time-consuming to set up and run; (b) involve certain requirements, e.g., acceptance tests for interactive binary instrumentation, unit tests for executability tracing; and (c) involve permission issues in case of the commercial tools. On the contrary, focusing on a single case study enables us to manually investigate the differences in the tools’ results.
OpenMRS has also been used in security research in the past (Crain, 2017; Tøndel et al., 2019; de Abajo and Ballestero, 2012; Lamp et al., 2018; Rizvi et al., 2015; Amir-Mohammadian et al., 2016). (Lamp et al., 2018) evaluated OpenMRS for medical system security requirements; (Rizvi et al., 2015) evaluated OpenMRS for access control checking; while (Amir-Mohammadian et al., 2016) studied OpenMRS for correct audit logging.
3.2. OpenMRS: Dependency Overview
In this section, we provide an overview of Maven and npm dependencies of OpenMRS. We parse the dependency tree of each project through native mvn dependency:tree and npm list command. We also parse each dependency’s scope and depth in the dependency tree.
Table 1 provides a dependency overview of OpenMRS. Note that, for Maven projects, there can be internal dependencies – that is – a project within the OpenMRS distribution can be listed as a dependency for another project. We do not count the internal dependencies in Table 1. Also, npm projects can contain lock files such as shrinkwrap.json, package-lock.json which are not considered.
|No. of projects||43||5|
|Total unique packages||311||1,498|
|Median dependency per project||127.0||840.5|
|Median dependency path per project||NA||1,675.0|
|Median depth of dependencies||2||4|
|Max. depth of dependencies||7||12|
|Median Provided dependencies||99.0||NA|
|Median Compile dependencies||3.0||NA|
|Median Runtime dependencies||5.0||NA|
|Median Test dependencies||24.5||NA|
|Median Production dependencies||NA||202.5|
|Median Production dependency path||NA||366.0|
|Median Developer dependencies||NA||807.5|
|Median Developer dependency path||NA||1,613.5|
4. SCA Tools
In this section, we explain the criteria we use to select the SCA tools; description of the tools; how we performed the scan on OpenMRS; and how we analyzed the reports produced by the tools.
4.1. Selection Criteria
To identify the existing SCA tools from both industrial offerings and the latest research, we performed an academic literature search and a web search through the following keywords: (vulnerable OR open source OR software) AND (dependency OR package OR library OR component OR composition) AND (detection OR scan OR tool OR analysis). From the relevant search results, we filtered the tools with the following inclusion criteria: a) scans either Maven or npm projects; b) we have access to an executable tool; and c) offers unique features when compared with already selected tools. From our selection process, we selected nine tools. Two of the tools are not freely available, and the license agreements prevent us from providing names. We refer to them as Commercial A and B. Out of the selected 9 tools, 4 tools can scan both Maven and npm projects, 1 tool scans only npm projects, while 4 tools scan only Maven projects.
We observed that SCA tools primarily differ in three dimensions:
Vulnerability database: To report the list of known vulnerabilities, the tools need a database. Tools can pull vulnerability data from third-party source(s) such as NVD CVEs (23). Additionally, SCA tools can maintain their own vulnerability database where they collect and verify vulnerability data through different techniques (Catabi-Kalman, ; Zhou and Sharma, 2017).
Dependency scanning source: SCA tools can detect open source dependencies from dependency manifest file, source code, and binaries. Typically, dependency files are the common source to resolve dependencies of a project as is done by the package managers as well.
Additional analysis to infer dependency use: Tools can perform additional static and/or dynamic analysis to infer how the dependencies are being used by an application.
4.2. Tool description
For the selected tools, we describe (a) if they scan Maven or npm dependencies, their (b) data source, (c) scanning technique, and (d) how we performed the scan for this study.
Snyk: Snyk also scans both Maven and npm projects. The tool works by scanning dependency files (39) and maintain its own vulnerability database (40). We ran the command line tool (Version 1.382.0) that is freely available through the command snyk test –all-projects –dev –json.
GitHub Dependabot: Dependabot scans both Maven and npm projects hosted on GitHub. GitHub maintains its own vulnerability database (13) where it pulls data from NVD, npm advisories. Additionally, maintainers on GitHub can publish vulnerabilities in their projects as well. We hosted the 44 studied projects on the first author’s GitHub account and retrieved the Dependabot alerts through GitHub API.
Maven Security Versions (MSV): This tool only scans Maven projects (46) through dependency files. We ran this tool through its Maven plugin.
npm audit: This is a native tool of npm package manager for scanning npm projects. The tool works by scanning dependency files and maintains its own vulnerability database (25). We used the npm audit –json command.
Eclipse Steady: This tool only scans Java (Maven) projects. The tool performs additional analysis to assess the execution of vulnerable code in the dependencies of an application (12). The approach implemented is described in (Ponta et al., 2018) and (Plate et al., 2015). The tool requires a manual set up, along with the vulnerability database provided by the tool. We used Version 3.1.10 of this tool. We set up Steady in a virtual machine, allocating 16 GB RAM, and 4 processor cores. Steady hosts their vulnerability data set on GitHub (42). The data set contains patch commit information for each vulnerability. We imported the data source updated on Jan 24, 2020. We then performed the patch analysis feature provided by the tool to identify the involved code constructs for each vulnerability. For reachability analysis of the identified vulnerabilities, Steady performs three analyses: 1) static call graph construction; 2) executing JUnit tests for analyzing executability traces; and 3) JVM instrumentation through integration testing. We were unable to complete the third analysis as the tool presumably ran out of memory after running for ten days.
WhiteSource: WhiteSource has a GitHub bot named “WhiteSource Bolt” (47) which scans both Maven and npm projects. WhiteSource also maintains its own vulnerability database (48). We connected the GitHub bot with our hosted repositories on GitHub and retrieved the issues created by WhiteSource through GitHub API.
Commercial A: This tool has scientific papers discussing their approach (not citing to maintain blindness). We contacted their research team and provided them with the repository links for the studied projects. They returned to us with scan reports only for Maven dependencies for 37 projects and reported that they failed to complete the automated scans for the rest of the projects which may have required manual intervention. This tool offers static analysis by default and dynamic analysis as an option to identify vulnerable call chains. We received results only with static analysis performed on the code. The tool maintains its own vulnerability database.
Commercial B: We used the free cloud edition of the tool, where it only scanned the Java dependencies (the customer support informed us that the tool does not scan front-end libraries). The tool checks for the reachability of vulnerabilities in dependency through interactive application security testing – that is – monitoring dependencies in use when an application is run and interacted with either through automated testing or human testers. The tool uses third-party vulnerability databases including NVD, which they curate themselves to enhance accuracy. To run this tool on OpenMRS, we make use of 123 test cases provided by OpenMRS for integration testing that interact with the application through a Selenium web-driver. We connected OpenMRS to this tool and used the integration test suite to interact with the application.
We collected the scan reports separately for 44 projects for 8 tools. For Commercial B, which analyzes the application during runtime, we get a single report for the whole OpenMRS distribution. As vulnerability data gets updated over time, we ran all the tools during September 2020 to ensure a fair comparison, except for Steady whose vulnerability data is from January 2020.
|Total (Median per project)|
|OWASP DC||12,466 (254.0)||332 (38.0)||149 (36.0)||313 (117.0)||289||24||14.4|
|Snyk||4,902 (66.0)||96 (6.0)||46 (6.0)||189 (23.0)||178||11||15.1|
|Dependabot||136 (0.0)||20 (0.0)||11 (0.0)||61 (0.0)||61||0||NA|
|MSV||3,197 (58.0)||36 (12.0)||14 (12.0)||36 (22.0)||36||0||3.4|
|Steady||2,489 (51.0)||91 (20.0)||39 (19.0)||97 (41.0)||89||8||385.0|
|WhiteSource||434 (0.0)||76 (0.0)||44 (0.0)||146 (0.0)||127||19||NA|
|Commercial A||2,998 (70.0)||107 (24.0)||53 (24.0)||208 (70.0)||187||21||NA|
|Total (Median per project)|
|1,379 (208.0)||498 (72.0)||239 (71.0)||160 (57.0)||234 (71.0)||78||156||4.4|
|Snyk||2,210 (135.0)||1,004 (44.0)||90 (20.0)||54 (17.0)||121 (26.0)||79||42||1.0|
|97 (8.0)||NA||32 (1.0)||30 (1.0)||45 (4.0)||29||16||NA|
|npm audit||1,266 (37.0)||852 (28.0)||58 (12.0)||45 (12.0)||62 (16.0)||31||31||0.1|
|WhiteSource||205 (32.0)||205 (32.0)||89 (14.0)||55 (9.0)||96 (18.0)||58||38||NA|
4.3. Analyzing Tool Results
Below, we discuss the metrics and information that we processed from the tool reports to answer our research questions.
Quantity of Alerts: When a project is scanned by a tool, the tool reports a raw count of alerts identified on the project. However, the alerts do not represent either unique dependencies or unique vulnerabilities. We observed that the same alerts can be repeated in tools’ reports for various reasons. The alert count, however, may indicate the amount of audit effort required from the developers.
Tracking unique dependency, dependency path, package, and vulnerability: The definitions of these four metrics, as used in this study, are provided in Section 2. When processing the analysis reports from all the tools, we store the data in a relational database schema. In the schema, we keep an identifier for each unique package, dependency (package:version), dependency path, and CVE identifier. For the non-CVEs, all tools except OWASP DC and Commercial A provide a tool-specific identifier. While OWASP DC and Commercial A provide no reliable identifier to track unique non-CVEs, upon manual inspection, we noticed that the vulnerability description along with the affected package(s) are a reliable way to track non-CVEs. However, we have no reliable way to map non-CVEs across different tool reports.
Scan time indicates the total number of minutes a tool took to scan all the projects. We have no scan time for GitHub and WhiteSource as they are GitHub cloud services. We collected the issues and alerts from GitHub at the end of September, at least two weeks after hosting the repositories. Commercial B monitors dependency during runtime through interaction, therefore, also does not have a definite scan time.
Other information: Tools had additional information in their reports, generally to aid developers in assessing the risk of the alerts and to help in fixing them. We also collected these additional data, which will be explained in Section 5 when discussing the findings.
Manual analysis of the tools’ report: To understand why there are differences in the tools’ results, we manually inspected the tools’ results. We specifically focused on the project coreapps as this is the project with the largest dependency count and includes both Maven and npm dependencies. The first author went through results from all the tools for coreapps, and categorized the differences. The second author then independently went through the results from the studied tools and verified the categorization done by the first author.
In this section, we present descriptive statistics on how SCA tools differed on vulnerability detection, a manual analysis on why the tools differed (RQ1); and a characterization of the metrics provided by the studied tools for aiding in risk assessment of vulnerability in dependencies, (RQ2).
5.1. RQ1: What are the differences between vulnerability reports produced by the different software composition analysis (SCA) tools?
The alert counts are higher than the count of unique vulnerabilities or dependency paths, as discussed in Section 4.3. While the total alert count repeats the same vulnerabilities found across projects, some tools repeat the same alert within a project as well due to modular project structure. We also see the unique dependency count is higher than the unique package count. Different versions of the same package may be declared as a dependency in different projects, while npm can have multiple versions of the same package as dependencies even within a single project. We now discuss how the SCA tools have differed in their reporting:
|Tool||Scope breakdown for Maven VDs||Scope breakdown for npm VDs||Direct VDs (across all projects)||Max. Depth of VDs|
The tools differed both on identifying unique vulnerable dependencies and the unique vulnerabilities: OWASP DC detects the highest number of unique dependencies and unique vulnerabilities for both Maven and npm projects. However, our analysis in Section 5.2 indicates more may not necessarily be better. Conversely, Commercial B that monitors the dependency under use during runtime detected the lowest number of vulnerable dependencies for Maven projects. MSV and Dependabot detected the lowest number of unique vulnerabilities, respectively, for Maven and npm projects.
5 out of the 8 tools for Maven reported non-CVEs while all the 5 tools for npm reported non-CVEs. We observe that npm packages have a higher proportion of non-CVEs to CVEs than Maven packages. We find OWASP DC to report higher non-CVEs than any of the other 4 tools for npm projects. However, as OWASP DC does not provide an identifier for non-CVEs, we tracked unique non-CVEs through vulnerability description and affected packages, which may have resulted in duplication of the same vulnerabilities.
Only 2 out of the 5 tools that scanned npm projects report vulnerable dependency paths: In npm, the same package A can be introduced transitively through multiple direct dependencies and therefore, can lie in multiple dependency paths. The developers may need to fix each path separately if there is a vulnerability in package A. We find that only npm audit and Snyk report all possible dependency paths to each unique vulnerability.
Tools have non-overlap in reported vulnerabilities and dependencies: We measured how much of the unique vulnerable dependencies reported by the tools overlap with each other. The heat maps in Figure 1 show overlap ratio across tool pairs for both Maven and npm projects. For a tool pair (A,B), the heat map shows how many dependencies reported by A were also reported by B and vice versa. For example, for maven projects, 54% of Snyk’s reported vulnerable dependencies were also reported by WhiteSource. Conversely, 68% of WhiteSource’s reported maven dependencies were also reported by Snyk. Figure (a)a, (b)b demonstrates non-overlap in dependencies through a Venn diagram for three representative tools. We can not show such a heat map for all unique vulnerabilities, as we were unable to cross-reference non-CVEs across tools. However, we also found non-overlap over reported CVEs as well across tools. Figure (c)c, (d)d demonstrates non-overlap in CVEs for OWASP DC, Snyk, and WhiteSource.
Tools detected vulnerable dependencies across all scopes and depths: Table 4 shows a breakdown of scan results per dependency scope and what portion of the reported vulnerable dependencies are introduced directly by OpenMRS. We find that reported vulnerabilities are mostly introduced through transitive dependencies, except for Dependabot and WhiteSource. The latter two tools assist GitHub projects in automatically fixing the vulnerable dependencies (by upgrading to a safer version), which may explain the high rate of direct dependencies in their reporting.
We find SCA tools to vary widely in the reporting of known vulnerabilities, for both Maven and npm dependencies.
5.2. Why do the tools differ on vulnerability reporting?
We list the reasons we characterized (with no particular order) through manual analysis behind the differences in the tools’ results:
Only OWASP DC reported vulnerabilities in internal dependencies: As mentioned in Section 3.2, OpenMRS can have internal dependencies which were reported only by OWASP DC. OWASP DC reported 200 dependencies which are OpenMRS projects. However, these 200 dependencies contain only 14 CVEs and 6 non-CVEs. OpenMRS projects are divided into many sub-modules and OWASP DC reports the same vulnerability separately for each sub-module, which results in an inflation of reported dependencies.
Same vulnerabilities can be repeated over multiple
packages: We observe tools may report the same vulnerability across many related packages, such as dependent packages of a vulnerable package. For example, CVE-2014-3625 was only reported for spring-webmvc by MSV, Snyk, Steady, and Commercial A. However, OWASP DC reported this CVE for five separate spring packages as NVD simply lists the whole spring-frameowrk as affected by this CVE. Conversely, OWASP DC detected functions of npm packages as individual dependencies. In the lodash package, OWASP DC detected 31 functions, such as lodash._baseassign and lodash._reevaluate, separately besides the package itself, and repeated the same 7 vulnerabilities for each of them while other tools simply reported the lodash package as vulnerable.
Prior work reported that relying on the Common Platform Enumeration (CPE) identifier that comes with CVE data may be a reason behind inaccurate vulnerability to package mapping (Kinzer, ). For example, OWASP DC reported the same 17 CVEs for activeio-core, activemq-core, and kahadb as they all map to the same CPE identifier while other tools only reported activemq-core.
Tools may have different mapping of vulnerability to affected versions of packages: Incorrect mapping of a vulnerability to the affected version range of a package may result in inaccurate alerts. For example, in commons-beanutils:1.7.0, OWASP DC, WhiteSource, and Commercial A reported CVE-2014-0114 and CVE-2019-10086 while MSV reported only CVE-2014-0114 ; Dependabot reported only CVE-2019-10086; and Snyk reported no CVEs at all. To investigate this difference, we looked into Snyk’s vulnerability database (40) and found that Snyk lists the affected version range as and respectively for the two CVEs and therefore, considers the version OpenMRS uses as free of these vulnerabilities. Similarly, Dependabot lists version range as affected for CVE-2014-0114 but all versions below as affected for CVE-2019-10086. In NVD, the affected versions for the two CVEs are listed simply as up to and up to .
Similarly, in the npm ecosystem, CVE-2018-1000620 was detected by all tools except Snyk for cryptiles:0.2.2. We found that Snyk’s database lists range as affected by this CVE. The NVD CVE data simply lists version up to as affected by the CVE.
Dependabot reported transitive dependencies through lock files: We notice that Dependabot typically only detect direct dependencies. In the two cases where Dependabot reported transitive dependencies were due to: a) the Maven dependency file explicitly declared the required version for the transitive dependency, and b) the lock file was present in the repository that declared the resolved versions of the full dependency tree. Further, no other tools reported dependencies from lock files except Dependabot, which detected 15 vulnerable dependencies from lock files.
Commercial B only reported vulnerabilities in dependencies under use during runtime: As Commercial B tracks dependencies through interaction testing as explained in Section 4.2, the tool only reported dependencies that were under use by OpenMRS during integration testing, which explains the low count of dependencies reported.
The state of CVEs may result in differences in tools’ results: After a CVE is published, the CVE may become reserved, disputed, or rejected based on new information (7). We observed that the state of CVEs may be one possible reason behind differences in CVE reporting, as SCA tools need to be timely updated and verify the changes in CVE states. For example, CVE-2019-10768 and CVE-2020-7676 in npm projects were detected by Snyk, Dependabot, and WhiteSource but not by OWASP DC and npm audit. However, the latter two tools reported one of them as non-CVEs with a more elaborate explanation. The other CVE is awaiting reanalysis (subject to further changes) which may be a possible reason they are not incorporated by the latter tools. Further, we found four rejected CVEs to be reported by WhiteSource and Snyk which were not reported by other tools.
Tools can report unique non-CVEs not reported by other tools: A comparison between non-CVEs across different tools requires manual analysis, as there is no common identifiers. We manually looked at a random sample of dependencies that were reported to have non-CVEs by multiple tools, and found that each tool reported non-CVEs that were not reported by any other tools in the study set.
For example, we observe the following cases in angular:1.6.1: OWASP DC reports two improper input validation vulnerability not reported by any other tool. While Snyk, WhiteSource, and Dependabot reported a similar XSS vulnerability, Snyk and WhiteSource also reported unique XSS not reported by others. Snyk also reported a unique denial of service not reported by the other tools. npm audit did not report any of these non-CVEs. We noticed similar differences in non-CVEs for other packages as well, e.g. lodash, ws.
We categorize 8 reasons behind differences in vulnerability reporting among the studied SCA tools, such as inconsistency in vulnerability to affected package version mapping.
5.3. RQ2: What metrics are presented by the SCA tools to aid in the risk assessment of dependency vulnerabilities?
When a vulnerability lies in a dependency, the risk of the vulnerability may need to be determined by how the application uses the dependency – that is – in the context of the dependency. We have observed that the studied SCA tools reported several metrics in scan reports to aid in such contextual assessment. We characterize these metrics into five categories:
5.3.1. Code analysis-based metrics
Tools may analyze source code or binaries to infer dependency usage and vulnerability reachability. Three of the tools, Steady, Commercial A, and B, use code analysis-based metrics for Java language:
|2,489||2,095 (84.2%)||340 (13.7%)||54 (2.1%)|
|2,489||2,437 (97.9%)||11 (0.4%)||41 (1.6%)|
|Commercial A: Vulnerable call chains|
|Total Alerts||Vulnerable Method Calls||Total Vulnerable Call Chain||Median Call Chain per Method|
Reachability Analysis: Tool can curate their vulnerability database with details on which part of the code (e.g. method, class) is involved in a specific vulnerability. Tools then can infer if the vulnerable code is reachable from the dependant application through static and/or dynamic analysis. Steady and Commercial A provides reachability analysis for each vulnerability in dependency.
Steady constructs static call graphs of an application to infer reachability, referred to as potentially executable (static analysis). Steady also looks at the executability traces through unit testing to determine if the vulnerable code is actually executed (dynamic analysis). Commercial A, similarly perform static analysis to identify vulnerable call chains – that is – the call chain from the application code that reaches the vulnerable method of the dependency.
Table 5 shows the reachability analysis from Steady and Commercial A. We find that for 84.2% of the alerts, Steady did not find the corresponding dependency to be used by the dependant application. Further, Steady found only 2.1% of the alerts were potentially executable and 1.6% of the alerts were actually executed. However, we found a disconnect between the findings of static and dynamic analysis. Only for 13 alerts, both static and dynamic analysis found the vulnerable code to be in use. Also, for 11 alerts where dynamic analysis found the vulnerable code to be actually executed, static analysis did not find any part of the dependency containing the vulnerability to be in use at all. This observation may indicate limitations to reachability analysis. Similar to Steady, Commercial A also found a low number of cases where the vulnerable code of dependency can actually be reached from application source code specifying 93 distinct call chains.
Static analysis, such as call graph construction for Java, is known to have limitations (Sui et al., 2020). The effectiveness of dynamic analysis, such as Steady’s is also dependant on having a good test-suite and test coverage. We see that OpenMRS projects reach only around 20% test coverage in Steady. The limited test coverage may have affected Steady’s findings.
Dependency Usage: The client application may only use a subset of the functionalities offered by a dependency. The code proportion of a dependency used by an application may indicate the probability of a vulnerability being reachable. Steady and Commercial B reports how many classes out of the total available are used in a dependency. For example, Commercial B found 203 out of 414 classes (49%) for spring-web and 790 out of 4,414 (17.9%) classes for groovy-all to have been used by OpenMRS.
5.3.2. Package Based metrics:
The characteristics of the dependency package itself may indicate the risk associated with it.
Package security rating: Commercial B provides a letter grade on their assessment of the security of a package. The tool calculates the security rating of a package based on its age, count of released versions, and number of known vulnerabilities. Out of the 17 packages being identified as vulnerable by Commercial B, 16 have an F rating while one has a D rating.
5.3.3. Dependency characteristics based metrics:
The scope and depth of the dependency may indicate the risk of the vulnerability it contains in the context of the application.
Dependency scope: For Maven projects, only Steady reported the scope for each dependency. For npm projects, Snyk and npm audit mentions the dependency scope.
Dependency depth: Risk may be associated with how deep a dependency lies within the dependency tree. Only Snyk and Steady indicate if a dependency is direct or transitive for Maven projects. For npm, Only Snyk and npm audit reports all possible dependency paths for each vulnerability indicating the possible depths.
5.3.4. Vulnerability based metric:
The characteristics of the vulnerability itself can be used in assessing risk. We found three types of information provided by the tools:
Severity: The industry standard for rating the severity of vulnerability is the Common Vulnerability Scoring System (5) (CVSS), which are publicly available for CVEs. For the non-CVEs, Snyk and Commercial A also present a CVSS score. However, Dependabot and npm audit present a severity rating on a scale of their own for both CVEs and non-CVEs. For both the tools, the scale consists of four levels similar to CVSS3 levels: low, moderate, high, and critical
Available exploits: The availability of known exploits may contribute in assessing the risk for a vulnerability in a dependency. For each vulnerability, Snyk provides information on whether an exploit is publicly available. For the 310 Snyk vulnerabilities, Snyk reports that 218 do not have a public exploit; 10 have a functional exploit; 37 have a proof of concept exploit; while 45 have unproven exploits available. Commercial A also reports on available exploits.
Popularity: How popular or well-known is a vulnerability may indicate the probability it may get exploited in the wild. Steady integrates Google trend analysis for each vulnerability in its reports which indicates the count of search hits within the past 30 days for the CVE identifier.
5.3.5. Confidence in alert validity:
SCA tools may provide a confidence rating for each alert, which may aid developers in prioritizing auditing.
Evidence count: As OWASP DC detects dependencies from scanning multiple sources, it can provide an evidence count as a proof for each dependency. The tool provides a confidence label, from Low to Highest, based on this evidence count. For all alerts except 4 Maven alerts, OWASP DC provided with either High or Highest confidence rating. Our tool survey highlights the research need for a common risk measurement framework specifically for dependency vulnerabilities that can be adopted by the SCA tools.
Below we discuss observations and implications from our work:
Accurate mapping of vulnerability to affected versions of packages should be ensured by the tools to avoid false positives: We showed examples in Section 5.2 on how inconsistency in vulnerability to package mapping can result in differences between tools’ results. We also find inconsistencies in what version range is listed as affected for a specific CVE by different tools. Our findings highlight the importance of maintaining the accuracy of the tools’ vulnerability databases.
Non-CVEs should get reported to CVE database to establish their validity and cross-tool vulnerability mapping. We find that SCA tools report known vulnerabilities in dependencies that do not have a CVE identifier. To understand why the non-CVEs might not have been incorporated into the CVE database, we look at their publication date. Of the 53 non-CVEs reported by Snyk, 41 were published before 2020; while of the 54 non-CVEs reported by WhiteSource, 50 were published before 2020. Therefore, developers may question why a reported vulnerability does not have a CVE identifier as CVE validation usually takes around three months. Furthermore, we see that different tools can report the same non-CVEs. However, cross-tool referencing cannot be done automatically without a common identifier like CVE. We suggest reporting the non-CVEs to CVE database to establish validity, make cross-tool referencing easy, and make the vulnerability widely known.
Developers may employ more than one tool to leverage different vulnerability databases: The reporting of unique non-CVEs by the tools highlights the existence of known vulnerabilities in the wild, not necessarily tracked by a centralized database. SCA tools may have delay in incorporating such non-CVEs, if not completely missing them. Prior work has shown that unawareness of the vulnerability is a major reason developers do not update vulnerable dependencies (Kula et al., 2018). Therefore, our findings suggest developers may use multiple SCA tools to be timely informed of all the known vulnerabilities.
Tools should suggest fix options while explaining the risk of any potential backward incompatibility. Out of the studied tools, Snyk, Dependabot, npm audit, and WhiteSource offers automated fix suggestions. However, prior work has found that fear of breaking change is one of the primary reasons developers do not want to update vulnerable dependencies (0patch.com, ). Tools may provide a more in-depth analysis on what code change is there with a certain version change in a dependency and if there is any possibilities of introduction of regression bugs.
Developers may need to evaluate security risk of dependency vulnerabilities case-by-case basis: Prior research has shown that not all vulnerabilities in dependency may be relevant for the client application (Pashchenko et al., 2020, ). We find two tools to offer reachability analysis for each dependency vulnerability. However, future research is required to evaluate if developers in the real world actually use such metrics and find them useful or not. The lack of a common framework for risk assessment among SCA tools suggest that developers will have to evaluate the vulnerability alerts case-by-case basis based on their expertise on the project codebase and how the dependencies are being used by the specific project.
We evaluate 9 SCA tools on one web application at a certain release point, which poses a threat to the generalizability of our findings. However, OpenMRS consists of 44 projects with 547 Maven and 2,213 npm dependencies, making our case study suitable for a comparative evaluation. Further, the single case study enables us to look in-depth with manual analysis on why the tools’ results differed. However, our findings may not generalize to other package ecosystems, such as Ruby or Rust. Similarly, any other application with a difference in dependency management may have yield different findings. For example, the application that actively maintains its dependencies updated will have a few vulnerable dependencies overall and therefore, would not show large differences among tools’ results. However, that would not invalidate the differences we observed for the studied SCA tools on OpenMRS. Another threat to the external validity involves the selection of the SCA tools. We explain our decision criteria in Section 4.1. While we are unable to cover all existing SCA tools, we do not claim the findings we have in Section 5.2 and 5.3 to be exhaustive.
Another limitation of our study involves the absence of ground truth. In the context of SCA alerts, there can be three steps for building such a ground truth - a) determining all the open source dependencies in use; b) determining the correctness of the reported vulnerability data; c) determining exploitability of the dependency vulnerabilities. These steps are nontrivial to perform manually for a real software and may be more suitable with a synthetic test subject (Delaitre et al., 2018). Regarding exploitability, no tools except Commercial B claims to filter out dependencies on any use criteria, while Steady and Commercial A only offers additional analysis to aid in such contextual assessment. Therefore, exploitability would not be a fair comparison criteria for the studied SCA tools, and out-of-scope for this study. Similarly, how developers respond to alerts from SCA tools is also not in the scope of this study. Further, for RQ1, we do not conduct any statistical test to measure significance in difference, as we only perform a single case study.
8. Related Work
9. Conclusion and Future Work
We thank the RealSearch group and anonymous reviewers for their feedback. Our research was funded by NSA.
-  Security patching is hard. Note: https://0patch.com/files/SecurityPatchingIsHard_2017.pdf Cited by: §6.
-  About semantic versioning. Note: https://docs.npmjs.com/about-semantic-versioning Cited by: §2.2.
-  (2016) Correct audit logging: theory and practice. In International Conference on Principles of Security and Trust, pp. 139–162. Cited by: §3.1.
-  Why do organizations trust snyk to win the open source security battle?. Note: https://snyk.io/blog/why-snyk-wins-open-source-security-battle/ Cited by: §2, item 1.
-  Common vulnerability scoring system. Note: https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System Cited by: §5.3.4.
-  (2017) Open source security assessment as a class project. Journal of Computing Sciences in Colleges 32 (6), pp. 41–53. Cited by: §3.1.
-  CVE states. Note: https://cve.mitre.org/cve/identifiers/ Cited by: item 7.
-  (2012) Overview of the most important open source software: analysis of the benefits of openmrs, openemr, and vista. In Telemedicine and e-health services, policies, and applications: Advancements and developments, pp. 315–346. Cited by: §3.1.
-  (2018) On the impact of security vulnerabilities in the npm package dependency network. In Proceedings of the 15th International Conference on Mining Software Repositories, pp. 181–191. Cited by: §8.
-  (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering 24 (1), pp. 381–416. Cited by: §8.
-  (2018) SATE V Report: ten years of static analysis tool expositions. Technical report National Institute of Standards and Technology, . External Links: Cited by: §3.1, §7.
-  Eclipse steady 3.1.14 (incubator project). Note: https://eclipse.github.io/steady/about/ Cited by: §4.2.
-  GitHub advisory database. Note: https://github.com/advisories Cited by: §2, §4.2.
-  (2020) An open guide to evaluating software composition analysis tools. Cited by: §2, §6.
-  (2015) In dependencies we trust: how vulnerable are dependencies in software modules?. Cited by: §8.
-  How does dependency-check work?. Note: https://jeremylong.github.io/DependencyCheck/general/internals.html Cited by: §4.2.
-  (2017) Structure and evolution of package dependency networks. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 102–112. Cited by: §8.
-  Using cpes for open-source vulnerabilities? think again. Note: https://www.veracode.com/blog/managing-appsec/using-cpes-open-source-vulnerabilities-think-again Cited by: item 3.
-  (2018) Do developers update their library dependencies?. Empirical Software Engineering 23 (1), pp. 384–417. Cited by: §6.
-  (2018) The danger of missing instructions: a systematic analysis of security requirements for mcps. In 2018 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), pp. 94–99. Cited by: §3.1.
-  Mitre cve datagbase. Note: https://cve.mitre.org/ Cited by: §2.
-  National vulnerability database. Note: https://nvd.nist.gov/vuln Cited by: §2, item 1.
-  NPM security advisories. Note: https://www.npmjs.com/advisories Cited by: §2.
-  Npm security advisories. Note: https://www.npmjs.com/advisories Cited by: §4.2.
-  (September 2012) Guide for conducting risk assessments, nist special publication 800-30. Note: https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final[Online; accessed 7-Oct-2020] Cited by: §2.
-  OpenMRS around the world. Note: http://guide.openmrs.org/en/ Cited by: §3.
-  OpenMRS reference application distribution. Note: https://wiki.openmrs.org/display/docs/OpenMRS+Reference+Application+Distribution Cited by: §3.
-  OpenMRS sdk. Note: https://wiki.openmrs.org/display/docs/OpenMRS+SDK Cited by: §3.
-  (2020) Top 10-2017 the ten most critical web application security risks. OWASP_Top_10-2017_% 28en 29. Cited by: §1.
-  (2020) Vuln4Real: a methodology for counting actually vulnerable dependencies. IEEE Transactions on Software Engineering. Cited by: §6, §8.
-  A qualitative study of dependency management and its security implications. Proc. of CCS 20. Cited by: §1, §6, §8, §9.
-  (2015) Impact assessment for vulnerabilities in open-source software libraries. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 411–420. Cited by: §4.2, §8.
-  (2018) Beyond metadata: code-centric and usage-based analysis of known vulnerabilities in open-source software. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 449–460. Cited by: §4.2, §8.
-  (2020) Detection, assessment and mitigation of vulnerabilities in open source dependencies. Empirical Software Engineering, pp. 1–41. Cited by: §8.
-  ”Introduction to the dependency mechanism”. Note: http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html Cited by: §2.1.
Why patching software is hard: technical challenges.
challenges-/a/d-id/1330181 Cited by: §9.
-  (2015) Relationship-based access control for openmrs. arXiv preprint arXiv:1503.06154. Cited by: §3.1.
-  Snyk open source security management. Note: https://support.snyk.io/hc/en-us/articles/360000925438-What-does-Snyk-access-and-store-when-scanning-a-project- Cited by: §4.2.
-  Snyk vulnerability db. Note: https://snyk.io/vuln Cited by: §4.2, item 4.
-  Sonatype oss index. Note: https://ossindex.sonatype.org/ Cited by: §2.
-  Steady vulnerability dataset. Note: https://github.com/SAP/project-kb Cited by: §4.2.
-  (2020) On the recall of static call graph construction in practice. Cited by: §5.3.1.
-  (2021) 2021 open source security and risk analysis report. Note: https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html Cited by: §1.
Collaborative security risk estimation in agile software development. Information & Computer Security. Cited by: §3.1.
-  Victims software vulnerability scanner. Note: https://blog.victi.ms/ Cited by: §4.2.
-  WhiteSource bolt for github. Note: https://github.com/apps/whitesource-bolt-for-github Cited by: §4.2.
-  WhiteSource vulnerability database. Note: https://www.whitesourcesoftware.com/vulnerability-database/ Cited by: §4.2.
-  (2017) Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 914–919. Cited by: §2, item 1.