The past two decades saw a considerable increase in software reuse, particularly of open-source components, both in commercial and free software. Automated package management tools allow developers to find and integrate third-party components in their projects with minimal effort. While automated dependency management simplifies software reuse, it may contribute to the phenomenon of software bloat . As Gkortzis et al. put it “code reuse cuts both ways”, since “a system can become more secure by relying on mature dependencies, or more insecure by exposing a larger attack surface via exploitable dependencies” .
In practice, only a fraction of the functionality (and code) of a dependency may actually be needed, and entire components could be redundant. Even if some dependency code is not reachable when included in a given application (and thus it can be considered dead code in that context), it can still contribute to extending the attack surface of that application, e.g., because it includes gadget classes leading to deserialization vulnerabilities111https://owasp.org/www-community/vulnerabilities/Deserialization_of_untrusted_data.
A promising way to reduce the attack surface of an application is to remove the unused parts of its dependencies, and a number of recent publications explore this direction proposing new techniques and tools, typically demonstrated by applying them to open-source projects (see Sec.4). Given the potential impact of these tools in increasing the security of enterprise applications, we conducted a case study to evaluate whether they could be adopted in practice at SAP.
In this paper, we study a real-world commercial Java application that is part of an SAP product, and we investigate the ability of three existing software debloating tools to distinguish the dependency classes that are used from those that could be removed without compromising the correct behaviour of the application. We propose a methodology to evaluate (i) how the removal of the classes reported as redundant impacts attack surface of the bundled application and (ii) how this affects the correct execution of the application.
Each tool was able to report a considerable number of classes as redundant. Once removed, the existing application tests continue to pass. We detected a (formerly) vulnerable class among those removed, which is an example of a small but tangible reduction in the attack surface. A manual review of the classes identified as redundant, however, revealed that none of the tools we considered was able to identify a class that is dynamically loaded at runtime, and that has been confirmed by the developer as being required.
The remainder of the paper is structured as follows. Sec. 2 provides an overview and comparison of (the selected) state-of-the-art debloating tools. Sec. 3 introduces the case-study methodology, and summarizes its results. Sec. 4 summarizes related work, and Sec. 5 concludes the paper briefly outlining possible future work.
2 Debloating Tools
|Original use-case||Remove dependency declaration (Maven projects)||Create self-contained Uber-Jar (Maven projects)||Shrink and obfuscate Java archives (Maven-independent)|
|Approach||Bytecode analysis (starting from prj. classes, considers literals to cover reflection)||Bytecode analysis (starting from prj. classes)||Bytecode reachability analysis (starting from entry points)|
|Slicing Granularity||Java archives (class-level info available as debug info)||Java classes||Java class members|
|Analysis input||Compiled project classes||Compiled project classes||Compiled project classes and classes (members) specified as entry-points|
|Analysis output||Modified POM file (with removed/excluded dependencies)||Uber-Jar with needed classes (unmodified bytecode)||Uber-Jar with needed classes (bytecode potentially shrinked and obfuscated)|
In our study, we consider mature, widely-used open source tools, as well as open source tools for software debloating readily available, sufficiently documented, and easily applicable to Java applications that use Apache Maven as build system222https://maven.apache.org/. Table 1 summarizes and compares the main characteristics of the three open-source tools.
Apache Maven Shade333https://maven.apache.org/plugins/maven-shade-plugin/ (Maven Shade for short) is a mature and well-established plug-in for Maven that creates self-contained Java archives (Uber-Jars), to be used at application runtime. Uber-Jars include all the application classes as well as all classes of the (runtime and compile time) dependencies.
As of version 1.4, Maven Shade supports the minimization of Uber-Jars such that only classes actually required for the artifact are re-bundled (option minimizeJar). The set of needed classes is computed using jdependency444https://github.com/tcurdt/jdependency, which uses ASM555https://asm.ow2.io/ to search the bytecode for referenced classes.
ProGuard666https://github.com/Guardsquare/proguard is a widely adopted obfuscator and shrinker for Java and Kotlin (Android) applications. It is typically applied to mobile applications to reduce download times and protect intellectual property via obfuscation.
Given Java archives and the specification of entry points as input, ProGuard recursively identifies the classes and class members that can be reached. Many other configuration options enable and fine-tune additional ProGuard features, (such as, field or method removal, obfuscation, method inlining, class merging).
DepClean  identifies and removes bloated dependencies that are part of the dependency tree of the project under analysis, but whose code is not used (neither directly nor indirectly) by the application. Differently from Maven Shade and ProGuard, the focus of DepClean is not to produce a compact Java archive for use at runtime, but rather to simplify the dependency tree at development time. Moreover, it is meant to work at the granularity of entire Java archives. The tool can be configured to produce detailed information about used code at the granularity of Java classes (option createResultJson), which makes it possible to evaluate its ability to identify the classes required by the application.
DepClean extends the Apache Maven Dependency Analyzer and uses ASM for bytecode analysis in order to build a Dependency Usage Tree, which extends the standard Maven dependency tree with edge labels to indicate whether direct, transitive and inherited project dependencies are used or bloated respectively. DepClean also parses the constant pool table of Java class files to cover dynamic, reflection-based invocations done through string literals and string concatenations.
Our case study investigates the ability of existing debloating tools to minimize the dependencies of an industrial Java application without breaking it. In particular we consider the ability of the tools presented in Sec. 2 to identify the code required by the application at class level. As ProGuard also supports shrinking at finer granularity to remove class members, we consider this tool in two flavours: ProGuard, shrinking used classes at member level, and ProGuard, leaving used classes untouched. Finally, we focus on the effect that the size reduction has on the attack surface of the application.
To compare the existing debloating tools, we use the following methodology to apply them to Maven projects.
Vanilla execution. We build and test the application, without using any debloating tool, in order to collect information required for the comparison. The dependencies of Maven projects are specified in a pom.xml file and have a scope that determines the phase of the build process in which they are required. As we focus on the reduction of code required in production, we consider the scopes compile and runtime as target of the debloating tools. Consistently with previous literature , we execute existing tests and we use their results as a proxy for semantic preservation.
Concretely, the vanilla execution:
Ensures that the application successfully builds with all tests passing (mvn install succeeds);
Collects all test cases, all application class names, and all compile and runtime dependencies with the class names therein;
Detects the presence of vulnerable classes.
Tool execution. Our investigation targets the ability of the tools to identify all and only the dependency code required by the application. As a result, we perform the debloating step outside of the tools, and rely on them only for providing the set of required classes. Accordingly, the tool execution comprises the following steps:
Run DepClean, Maven Shade, ProGuard, ProGuard.
Transform the tool output into a file containing the list of used classes (if not already available).
Collect the names of all classes of compile and runtime dependencies reported by the tools as used by the application.
Copy those classes from the output of the tool to target/classes.
Adjust the pom.xml to remove all compile and runtime dependencies.
Run the existing tests on the debloated application.
Detect the presence of vulnerable classes in the debloated application.
The list of used classes (cf. step (2)) from Maven Shade, ProGuard, and ProGuard was created by listing the content of the Jar artifact produced by the tool. For DepClean, we enabled the configuration setting createResultJson and rewrote the class names contained in the result file into a plain list in order to have the same output for each tool. Also note that for ProGuard and ProGuard we had to create an application-specific configuration file containing the information of all application and test classes to be used as entry points for the analysis (keep option) and we disabled all optimization and obfuscation features.
As the shrinking option of ProGuard always removes unused methods of a class unless a configuration forces the tool to keep it untouched, in the case of ProGuard we run the tool as-is. For ProGuard, to get the list of used untouched classes, we iterate steps (1) and (2) by adding the used classes as additional entry points (to be kept as-is) in the configuration file until no additional class is reported as used.
In step (4) we copy the classes reported as used to the project’s target/classes folder so that they will be available when running the existing test, i.e., they are treated as application classes, and in step (5) we remove compile and runtime dependencies from the pom.xml. We opted for this custom debloating as it allows us to focus on the ability of the tools to identify the used classes while allowing us to uniformly collect the details required for measuring the reduction in terms of size and attack surface.
3.2 Subject Application
For our case study, selected a Maven project that is part of SAP’s Energy Data Management solution. It uses JAXB and EclipseLink777https://www.eclipse.org/eclipselink/ to (un)marshal XML documents related to energy measurements. It is actively developed and several releases have already been made available to customers through the deployment on the SAP Cloud Platform. The project is characterized as follows:
10 direct dependencies (2 compile, 2 provided, 6 test)
20 resolved dependencies (4 compile, 3 provided, 13 test)
260 application classes, 2725 compile dependency classes
62 test classes amounting to 446 test cases
The methodology of Sec. 3.1 was applied to the application above. We had a successful run of the vanilla execution and of the tools execution on Ubuntu 18.04 using JDK 1.8.0 , Maven 3.8.1, Maven Shade 3.2.4, proguard-maven-plugin888https://wvengen.github.io/proguard-maven-plugin/ 2.3.1 (configured to use ProGuard version 7.0.1), and DepClean created from revision cbfc395 in https://github.com/castor-software/depclean.
|Execution||Classes||Size (KB)||Test success||Vulnerable classes|
Table 2 shows the results of the executions. In the vanilla execution we collected 2725 classes from the 4 compile dependencies of the application, amounting to 14,68 MB of disk space (cf. Column ”Size”). Both Depclean and ProGuard reported the same set of 11 classes as being used. Maven Shade reported one additional class. ProGuard was able to reduce the dependencies to a single class by removing all members not used by the application, which contained all the references to the 10 classes reported by ProGuard, DepClean and Maven Shade. As a large share of classes were reported as redundant, the size on disk was significantly reduced in all cases. For all the tools, the existing tests were still passing on the debloated application we constructed as described in Sec. 3.1.
|Dependency (Maven artifactId)||Scope||Classes||DepClean||Maven Shade||ProGuard||ProGuard|
Table 3 details, for each compile dependency, how many classes the tools report as used by the application. With the constraint of leaving the original classes unmodified, all tools identify as used the same set of 11 classes from commons-io. Instead, ProGuard shrinks unused members from a used class of commons-io thus removing the 10 classes imported therein. Maven Shade is the only tool reporting a package-info.class from org.eclipse.persistence.moxy (moxy for short) as used. It contains a single runtime annotation applicable to other classes of the same package. However, no other class is reported as used, thus, it would not be applied to any class after debloating.
By manual inspection, we observed that the application makes use of a service implementation offered by the direct dependency moxy, declared according to the Java SPI (Service Provider Interface) mechanism. Java SPI allows service consumers to only reference the service interface, while the actual implementation is made available at runtime. None of the tools was able to identify the class specified in the Java SPI configuration file, thus no service implementation would be available.
Finally, we observed that one (formerly) vulnerable class was part of the application dependencies, and has been removed by all debloating tools (cf. column ”Vulnerable Classes” of Table 2). The class is org.apache.commons.io.FilenameUtils.java, contained in commons-io, and subject to CVE-2021-29425999Fixed by https://github.com/apache/commons-io/commit/2736b6f..
Despite having a considerable number of test cases, the case-study shows the limitations of tests as an oracle for semantics preservation. The application developer confirmed that the unavailability of the service implementation class, reported as redundant by all tools, would break the application at runtime.
ProGuard allows manual configuration of entry points to be kept, so we run the tool specifying the SPI service implementation class as additional entry point. As a result, 209 classes of the direct dependency moxy are reported as used, as well as 34 and 1340 of its transitive dependencies asm and core (full Maven artifactId available in Table 3). Thus, considering also the results of our manual inspection, the application dependencies could be reduced by half, still removing the (formerly) vulnerable class and reducing its attack surface.
Our case study shows the potential of debloating dependencies on an industrial application of limited size and complexity (e.g., just four dependencies required at runtime) and already points out a critical need for improvement.
4 Related Work
The effect of software reuse on security is investigated by Gkortzis et al. in , who show empirical evidence of the relation between the size of a code base and its likelihood to contain some vulnerabilities. Recently, Soto-Valero et al. conducted a large-scale study to assess the prevalence of bloated dependencies in the Maven ecosystem . In the same paper, they presented DepClean, one of the tools we used in our case study. In , Bruce at al. propose JShrink, a framework to debloat Java application using static and dynamic analysis techniques. Though their implementation is only available as replication package, the results they reported show the potential of the approach.
A related problem is how to measure the actual security improvements obtained by debloating: one approach is to count the number of known vulnerabilities (CVEs) removed . Other works, for example , use metrics based on the number of gadget chains that can be successfully removed.
This paper investigates well-known and readily available debloating tools to evaluate their ability to discriminate used from redundant code at class level. This is done at the case of an industrial application. Moreover, it quantifies the attack surface reduction in terms of removed vulnerable classes.
Considering the successful debloat of commons-io, including the removal of a (formerly) vulnerable class, the case-study confirms the potential of debloating tools to reduce an application’s attack surface. It also shows that state-of-the-art tools could not handle a widely-used, standard Java mechanism for dynamic class loading as used in the application at hand.
Future work should consider additional debloating tools and techniques to evaluate how they deal with dynamic features. Besides, to use debloating tools in industrial settings, they must handle software of increasing size and complexity, and integrate easily in CI/CD pipelines, with limited manual configuration.
Acknowledgements. This work is partly funded by EU grants No. 952647 (AssureMOSS) and No. 830892 (Sparta).
-  C. Soto-Valero, N. Harrand, M. Monperrus, and B. Baudry, “A comprehensive study of bloated dependencies in the maven ecosystem,” Empirical Software Engineering, vol. 26, no. 3, p. 45, Mar 2021. [Online]. Available: https://doi.org/10.1007/s10664-020-09914-8
-  A. Gkortzis, D. Feitosa, and D. Spinellis, “Software reuse cuts both ways: An empirical analysis of its relationship with security vulnerabilities,” Journal of Systems and Software, vol. 172, p. 110653, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0164121220301199
-  B. R. Bruce, T. Zhang, J. Arora, G. H. Xu, and M. Kim, “Jshrink: In-depth investigation into debloating modern java applications,” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2020. New York, NY, USA: Association for Computing Machinery, 2020, p. 135–146. [Online]. Available: https://doi.org/10.1145/3368089.3409738
-  S. E. Ponta, H. Plate, A. Sabetta, M. Bezzi, and C. Dangremont, “A manually-curated dataset of fixes to vulnerabilities of open-source software,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), 2019, pp. 383–387.
-  C. Qian, H. Koo, C. Oh, T. Kim, and W. Lee, “Slimium: Debloating the chromium browser with feature subsetting,” in Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 461–476. [Online]. Available: https://doi.org/10.1145/3372297.3417866