Despite the adoption of “configuration as code” that enforces rigorous review, validation, and canary, configuration changes are still among the major causes of system failures and service incidents of today’s cloud and Internet services, as reported by numerous failure studies and news reports [4, 51, 52, 15, 10, 20, 38, 26, 24, 9]. In our experience, configuration changes that cause real-world system failures are typically more sophisticated than trivial mistakes (e.g., typos); instead, such changes often violate subtle constraints that are hard to be spotted via manual reviews or captured by rule-based validation. The misconfigured values are often not absolutely wrong, but are not anticipated by the software system that uses the values during runtime execution; sometimes, even valid configuration values could trigger software bugs that are previously unknown.
We argue that one essential missing piece in configuration management is testing which can address the fundamental limitations of configuration validation discussed in §2. However, despite being treated as code, configuration changes are not tested like code changes. Certainly, configurations are not executable code but values, and thus cannot be directly tested on their own. On the other hand, configuration values can be and should be tested together with code, in order to exercise the semantics and effects of the values. Essentially, a configuration value is not too different from a constant value, once the value is configured (i.e., fixed). A configured value can be evaluated like a constant in traditional software tests—exercising the code using the value and asserting the execution behavior. We refer to such testing designed for configuration as configuration testing111In the past, the term was used to refer to configuration-aware software testing  and compatibility testing , which are completely different from the literal meaning of the term (testing configurations) used in this paper., which can be done at the unit, the integration, and the system level. This paper focuses on unit- and integration-level configuration testing [6, 40].
We advocate that configuration testing should be considered as the primary defense against bad configuration changes (including both erroneous configuration values and software defects exposed by valid value changes). Figure 1 positions configuration testing in the state-of-the-art configuration management and rollout process, and compared it with software testing for code changes. We believe that configuration testing can address the fundamental limitations of de facto practices against bad configuration changes, (discussed in §2).
We believe that configuration testing is practical with few barriers to adoption. With modern reliability engineering practices and the DevOps movement [38, 20, 35, 9] as well as the significant impact of configuration changes, configuration management has already been done in a systematic way (e.g., treated as code [38, 20]). This sets up the natural framework and process for configuration testing, as shown in Figure 1.
DevOps breaks the longstanding assumption that configurations are managed by traditional system administrators (sysadmins) portrayed as those who do not read/write code and do not understand a system’s internal implementation [50, 22]. In the era of configuration as code, configurations are managed by engineers who implement the software and test their code continuously. In §3.1, we show that many existing software tests naturally include the test logic for configurations, which indicates that configuration tests can be implemented and maintained like existing software tests. Note that configuration testing can also be applied in the traditional sysadmin-based settings. It requires software developers to implement configuration tests and release them to sysadmins.
Configuration testing can directly benefit from existing software testing techniques. Specifically, configuration testing can be built on top of existing testing framework and infrastructure. In principle, both configuration testing and traditional software testing exercise the software code under test and assert the expected behavior (e.g., program outputs). The difference lies in the testing purposes. Traditional software tests use a few representative configuration values (testing all possible values and their combinations is known to be prohibitive ). The representative values hardcoded in tests are typically the commonly-used configuration values, as the purpose is to examine the correctness of code implementation. Configuration testing, on the other hand, tests the actual configured values to be deployed (which could be erroneous) by evaluating whether the code under test can use these values to deliver the expected behavior.
We show that many existing software tests can be reused and be systemically transformed into configuration tests by parameterizing hardcoded configuration values and plugging the actual configured values—the test logic evaluates both code and configurations (§3.1). We discuss the feasibility of automatically generating new configuration tests, which differs from traditional test generation with the goal of high code coverage (§3.2). We also discuss measures of the adequacy and quality of configuration tests (§3.3 and §3.4), as well as continuous configuration testing for incremental changes in the context of continuous integration and deployment (§3.5).
This section reflects on the fundamental limitations of the state-of-the-art research and practices for defending against misconfigurations. The reflection drives the motivation of configuration testing.
2.1 Validation cannot replace testing
We argue that validation should not be the primary quality assurance for configuration changes, in response to the recent trend of investing in extensive validation [5, 38, 13, 34, 33, 39, 19, 55, 29, 39].222Validation is referred to as checking configuration values based on external specifications. It is different from testing that evaluates how configuration values are internally used by the system . The fundamental limitation of configuration validation roots in its segregation from the code logic—unlike testing, validation is agnostic to the code that uses configuration values, but relies on external correct rules which is often a subset of the constraints required by the software. In our view point, validation as well as learning based methods (§2.2) should supplement configuration testing, for example, checking empirically good practices and hidden patterns, or being used when configuration tests are not available.
Validation can hardly cover all the constraints. Essentially, validation checks configuration values against correctness rules derived from external specifications. It is prohibitively difficult to codify rules that capture all the constraints required by the software implementation. Our prior work  shows that one configuration value could have multiple different constraints, and constraints could be subtle and hard to codify into static rules.
Prior work proposes to automatically infer constraints from configuration file datasets [34, 33, 55, 54, 39], documents , and source code [50, 19, 31]. However, all those methods can only infer a few specific types of constraints, rather than the complete constraint set.
Validation cannot deal with software bugs exposed by valid configuration values. Validation is typically agnostic to how the system interprets and uses configuration values internally. Oftentimes, the validation rules derived from external specifications do not match the constraints required by the actual implementation due to software bugs . In our experience, such bugs are not uncommon, in which the configuration values are not necessarily incorrect but are often not anticipated.333One common pattern is misinterpretation of configuration values due to undefined specifications and bugs in code for parsing and canonicalizing raw values from configuration files.
Validation is not cheaper than testing. Validation requires implementing and maintaining validation code, typically using separate languages and frameworks [5, 38, 13, 55]. Maintaining validation code requires excessive and continuous effort, because: 1) modern software systems expose enormous numbers of configuration parameters ; 2) each parameter could have multiple constraints; and 3) constraints change with software evolution .
Empirical evidence. Recent studies reported that configuration validation is often deficient even with mature engineering practices [47, 38]. Our prior study reveals that 4.7%–38.6% of the configurations used in error handling and fault tolerance do not have checking code . As reported in Facebook’s study , despite extensive validation effort, erroneous configuration changes still found their way into production and resulted in service-level incidents, including both obvious and sophisticated misconfigurations. Specifically, among the configuration-induced incidents 22% of them were caused by configuration changes that exposed software bugs, which cannot be tackled by value-based validation.
Therefore, instead of trying to codify every single constraint into rules, one should directly test how configuration values are used by the code, i.e., the configuration testing proposed in this paper. This not only saves the effort of codifying and maintaining validation rules, but also covers precise and complete constraints.
2.2 Learning-based methods are no magic
A frequently explored direction is to use machine learning techniques to discover configuration patterns from “big” configuration data. Our experience shows that learning algorithms hardly work if not scoped with predefined rule types asa priori . Nevertheless, few learning-based methods have guarantee on false positives or negatives. As observed by IBM folks, without explicit guarantee, “learning-based methods have rarely found use in production systems on a continuous basis .”
Misconfigurations may not be outliers and vice versa.
It is challenging to determine the correctness of configurations based on their values. Prior work proposes to use outlier detection algorithms to detect misconfigurations; however, outliers could come from special customization instead of misconfigurations. Misconfigurations may not be outliers either. Default values are often the mostly-used values, but staying with defaults incorrectly is a common pattern of misconfigurations .
Datasets are not always available. Learning relies on large configuration datasets collected from independent sources. Such datasets are not always available in typical cloud settings where all the configurations are managed by the same operation team. Learning-based methods are more suitable for end-user software with large user bases, e.g., Windows-based applications [43, 41, 16, 53, 17].
2.3 System tests are not targeted
System tests are large-scale tests designed for evaluating end-to-end system behavior before deployment (cf. Figure 1). For configuration values, the key problem of existing system tests is the lack of coverage measures. For a configuration change, we often do not know whether the test run exercised the changed values and what to measure as the impact of the changes. Since configurations can be only used under special conditions, testing steady states may not expose latent configuration errors . Test selection also becomes difficult—even for a small change that updates one configuration value, the common practice is to run all the system tests.
3 Configuration Testing
The key idea of configuration testing is to test configuration values by executing software code that uses these values and asserting expected behavior (e.g., outputs). Unlike traditional software tests that use hardcoded configuration values (for the purpose of finding bugs in the code), configuration testing exercises software programs with the actual configured values to be deployed in production. Figure 2 shows a unit-level configuration test, compared with a unit test shipped with the software.
In essence, a configuration value is not fundamentally different from a constant value, once the value is configured (i.e., fixed). Traditional in-house testing evaluates constant values, but cannot effectively deal with configurable values due to the challenges of covering all possible values and their combinations that may occur in the field. Configuration testing does not attempt to explore the entire configuration space. Instead, it concretizes the configurable values with the actual configured values in the test code, and evaluate whether the software using the value behaves as expected.
Configuration testing can be done at the unit, the integration, and the system level to evaluate different scopes of the software system under test. In this section, we discuss primary results and open problems towards enabling configuration testing in large-scale production systems.
3.1 Reusing existing software tests
We find that many existing software tests, including unit, integration, and system tests, can be reused for configuration testing. This section focuses on unit and integration tests, but the ideas also apply to system tests.
Conceptually, reusing existing test code for configuration testing involves two steps: (1) parameterizing hardcoded configuration values in the test code, and (2) concretizing the parameterized value with the actual configured value to be deployed into production.
In our experience, the two steps can be systematically done based on well-defined configuration APIs in modern software systems.444The observation has been validated by many studies on real-world systems [50, 38, 47, 35]. Typically, mature software projects have customized configuration APIs wrapping around common libraries such as java.util.Properties for Java in Hadoop, configparser for Python in OpenStack, and Thrift structures in Configerator . In Figure 2, HDFS uses a set of get and set APIs for retrieving and rewriting configuration values stored in a Configuration object. By replacing TEST_KEYFILE with the actual configured values (Line 29), we transform the original software test into a configuration test. Further, one can automatically transform existing software tests into configuration tests by rewriting configuration objects or intercepting configuration API calls. Note that such automated transformation may not lead to valid configuration tests if the test logic is specific to the original hardcoded values, which in our experience is not uncommon.
The opportunity of reusing existing software tests for configuration testing. To investigate the feasibility of reusing existing tests, we analyze the test code of four systems in the Hadoop project, all of which implement unit and integration tests using JUnit and use configuration APIs similar to those in Figure 2. Table 1 shows that a significant number of existing tests use configuration values in test code—these tests create Configuration objects and pass them to the code under test. All these tests can be potentially reused for configuration testing. Specifically, many of these tests do not customize any configuration values in the test code (these tests do not set any specific values), but take the default values stored in the Configuration object. We observe that these tests tend to be generic—the tests are supposed to work, even when the values in the Configuration object are changed.
Table 2 shows that 90+% of the configuration parameters are used by running existing tests. We instrument configuration get APIs to count configurations retrieved at runtime during the execution of the test suite—all the studied systems retrieve configuration values on demand (when they need to use the values). Note that the numbers do not reflect the coverage metric based on the slice of a configuration value defined in §3.3.
Primary results. We evaluate the effectiveness of configuration testing using 45 latent misconfigurations in the dataset of our prior work  for the systems listed in Tables 1 and 2. We find that all the evaluated latent misconfigurations can be captured by unit-level configuration testing. Most importantly, we find that the configuration tests that are able to catch these misconfigurations can be directly created by reusing existing tests shipped with the systems.
Specifically, 43 out of 45 can be detected by running existing test code with automated parameterization-and-plugin transformation without any modifications; the remaining two require additional changes of the original test code (for setting up external dependencies).
|Software||All Tests||Tests Using Configurations|
|Software||# Total||# Parameters|
|Parameters||Exercised in Tests|
3.2 Creating new configuration tests
We envision software engineers implementing configuration tests, like how they do software testing. Configuration tests requires test framework support for parameterizing configuration values in test code and concretizing the parameterized values upon configuration changes. Such support can be built by extending existing test frameworks (e.g., on top of parameterized test support in JUnit).
Similar to software tests, configuration tests need to be maintained continuously to accommodate the software evolution. For example, new tests need to be added when new configuration parameters are introduced, while existing tests need to be revised when the usage of configuration values changes in code. To assist engineers to create new configuration tests, tooling can be built to identify and visualize code snippets that use configuration values based on existing techniques for tracking configuration values in source code [47, 2, 3, 56, 30, 50].
Automatic configuration test generation is possible. In fact, it is likely a simpler problem compared with traditional test generation with the goal of exploring all possible program paths . Configuration tests only need to cover program paths related to the target configuration values, which in our experience only touches a small part of the program and does not suffer from path explosion. An effective approach is to enforce configuration-related program paths based on satisfiability.
3.3 Adequacy and coverage
As a type of software testing, configuration testing needs adequacy criteria for selecting and evaluating configuration test cases. We find that code coverage metrics (statement, branch, and path ) are not suitable as adequacy criteria for configuration testing—high coverage of the entire code base is an overkill of configuration testing.
We propose configuration coverage as an adequacy criterion of configuration testing. At a high level, configuration coverage describes whether or not the program slice of the target configuration value is covered by the configuration tests.
Configuration parameters. For a configuration test suite, a configuration parameter is covered if the tests exercise all the execution paths in the program slice of the parameter. The program slice of a configuration parameter can be generated using static or dynamic taint analysis that takes the parameter’s value as initial taints, and propagates taints through data- and control-flow dependencies, which is a common practice used in prior work [47, 2, 3, 56, 30]. Thin slicing  is commonly used in practice to avoid over-tainting due to unbounded control-flow dependencies, while a broader slice definition  can be used in configuration testing to expose bugs trigged by configuration changes.
Configuration changes. Given a configuration change, the tests should exercise not only the changed parameters, but also other parameters that depend on the changed ones. We define that a parameter depends on another parameter , if the program slice of is affected by ’s value. Common patterns of dependencies include both control- and data-flow dependencies. For example, is only used when has certain value ( enables a feature and controls the behavior of the feature), or ’s value is derived from ’s value. In both cases, when ’s value is changed, should also be tested.
We use the term “quality” to refer to the correctness and effectiveness of configuration test cases, measured by the false negatives and false positives. The quality of configuration tests, especially those automatically transformed from existing tests (§3.1) should be carefully evaluated to make configuration testing effective in practice.
The quality of configuration test suites can be empirically evaluated using known good and bad configuration values to measure false positives and negatives respectively. On the other hand, collecting a comprehensive set of good and bad configuration values turns out to be non-trivial—knowing all the good and bad values are equivalent to knowing all the constraints of the configuration. Fuzzing and mutation based methods [50, 14, 18, 58] can potentially be applied to generate representative misconfigurations, while good configuration can be collected from historically used values  and community-based data sources .
3.5 Continuous configuration testing
With continuous integration and deployment, configurations evolve in frequent updates that only change a small number of configuration values. For example, Facebook reported that 49.5% of configuration updates are two-line revisions .
The procedure of continuous configuration testing is in the same vein as regression testing in continuous integration and deployment. Given a configuration change, one should selectively run only the tests related to the changed configuration values instead of the entire test suite to reduce cost and improve efficiency. The key to testing incremental configuration changes is to associate each test with the configuration parameter whose impact can be evaluated by the test. This can be done by test selection based on the adequacy criteria discussed in §3.3.
4 Open Problems
Despite its promises, configuration testing faces a number of open problems:
Automated test reuse. Automated or semi-automatic methods for transforming existing software tests into configuration tests can significantly reduce the barrier of adoption and bootstrap configuration test suites, given that mature software projects all have comprehensive software test suites. As discussed in §3.1, the major challenge is not about parameterize the hardcoded values in existing tests, but to understand the test code and analyze the test code logic regarding the configuration values. An effective reusing method should be able to differentiate hardcoded values that are specific to the test cases versus those that are generic, or at least identify (and exclude) test cases specific to the hardcoded values.
Test generation. We believe that automated test generation can be done at the level of unit and integration tests, in a similar manner as test generation for software code. The feasibility has already been demonstrated by our prior work, PCheck —the checking code generated by PCheck is essentially a test. On the other hand, the test generated by PCheck is basic and does not incorporate much of the semantics derived from the code logic due to its limitation of dealing with dependencies and side effects, both of which can be addressed by configuration testing. Section 5 gives a in-depth, retrospective discussion on this matter.
Dependency analysis. Dependency analysis is essential to effective configuration testing, especially to test selection for incremental configuration changes as discussed in §3.3. While prior work has investigated methods to discover dependencies between configuration parameters and their values [55, 50, 33], none delivers sound and complete results. It is perhaps reasonable for developers to encode dependencies when introducing new configuration parameters, while a thorough understanding of various types of configuration dependencies is desired.
Testing performance, security, and resource utilization. Most of the discussion in this paper implicitly focuses on correctness from the software program’s standpoint. On the other hand, the impact of a configuration change often goes beyond correctness properties, as configurations could affect performance, security, resource utilization as revealed in prior studies [2, 42, 32, 12, 49]. Configuration testing is not limited to correctness, and should be applied to other aspects of software systems as well. One challenge lies in the impact analysis of configuration changes—unlike correctness, performance, security, and resource utilization is often not straightforward or deterministic to measure.
Testing code changes with deployed configuration. A natural extension to the idea of configuration testing is to run the configuration tests for code changes with the deployed configuration values. Such testing can catch bugs that are not exposed in traditional software testing due to the inconsistency between the configuration deployed in production and the configuration hardcoded in the software tests. Therefore, the configuration tests can be used for testing both configuration and code changes: the former plugs the configuration to be deployed, while the latter plugs the configuration already deployed. Note that the testing pipeline could still be separate due to the independence of code and configuration rollout.
5 A Retrospective Discussion
Given the impact of misconfigurations in real-world applications, especially cloud and Internet services [26, 24, 52, 20, 15, 32, 10, 4], recent effort on tackling misconfigurations has shifted from reactive methods (troubleshooting) [41, 56, 3, 2, 37, 21, 30, 45] to proactive methods (validation and error detection) [50, 47, 13, 34, 33, 38, 5, 29, 55]. Configuration testing is along this line, aiming at proactively capturing undesired system behavior introduced by configuration changes before production deployment.
As discussed in §2.1, existing configuration validation is segregated from the code using configurations, and can hardly cover all the constraints or deal with bugs exposed by configuration changes. Our prior work, PCheck , explores the feasibility of using the code from the original software to check configuration values. Despite the promising results, we have come to the conclusion that PCheck’s method is fundamentally limited.
First, PCheck is significantly incomplete due to its difficulty in dealing with external dependencies and avoiding side effects. PCheck only detects around 70% of the well-scoped misconfigurations  (all of them can be exposed by configuration testing, §3.1). Second, PCheck identifies misconfigurations solely based on generic signals (exceptions and error code). It cannot deal with legal misconfigurations which do not manifest through runtime anomalies, but produce unexpected system behavior . Legal misconfigurations are major reliability threats in real-world systems .
Configuration testing addresses the above limitations. It can exercise code with side effects; external dependencies can be mocked or auto-generated (cf. §3.2). Legal misconfigurations can be captured by asserting expected behavior, as how assertions are used in software tests.
6 Concluding Remarks
To conclude, this paper presents the proposal of configuration testing as a key reliability engineering discipline for configuration management in large-scale production systems. The essence of treating configuration as code is to apply rigorous software engineering principles and techniques for configuration management, which should go beyond current practices. We hope that this paper will open the direction of configuration testing and inspire innovations and endeavor to make testing a regular practice for system configuration.
We thank Darko Marinov for the invaluable discussions on the idea of configuration testing.
-  Adrion, W. R., Branstad, M. A., and Cherniavsky, J. C. Validation, Verification, and Testing of Computer Software. ACM Computing Surveys 14, 2 (June 1982), 159–192.
-  Attariyan, M., Chow, M., and Flinn, J. X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12) (October 2012).
-  Attariyan, M., and Flinn, J. Automating Configuration Troubleshooting with Dynamic Information Flow Analysis. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10) (October 2010).
-  Barroso, L. A., Hölzle, U., and Ranganathan, P. The Datacenter as a Computer: Designing Warehouse-Scale Machines, 3 ed. Morgan and Claypool Publishers, October 2018.
-  Baset, S., Suneja, S., Bila, N., Tuncer, O., and Isci, C. Usable Declarative Configuration Specification and Validation for Applications, Systems, and Cloud. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (Middleware’17), Industrial Track (December 2017).
-  Bland, M. Goto Fail, Heartbleed, and Unit Testing Culture. https://martinfowler.com/articles/testing-culture.html, June 2014.
-  Cadar, C., and Sen, K. Symbolic Execution For Software Testing: Three Decades Later. Communications of the ACM 56, 2 (February 2013), 82–90.
-  Davidovič, Š., and Beyer, B. Canary Analysis Service. Communications of the ACM 61, 5 (May 2018), 54–62.
-  Davidovič, Š., Murphy, N. R., Kalt, C., and Beyer, B. Configuration Design and Best Practices. The Site Reliability Workbook, Chapter 14, O’Reilly Media Inc. (August 2018), 301–314.
-  Gunawi, H. S., Hao, M., Suminto, R. O., Laksono, A., Satria, A. D., Adityatama, J., and Eliazar, K. J. Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC’16) (October 2016).
-  HDFS-7727. Check and verify the auto-fence settings to prevent failures of auto-failover. https://issues.apache.org/jira/browse/HDFS-7727.
-  Hoffmann, H., Sidiroglou, S., Carbin, M., Misailovic, S., Agarwal, A., and Rinard, M. Dynamic Knobs for Responsive Power-Aware Computing. In Proceedings of the 16th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS’11) (March 2011).
-  Huang, P., Bolosky, W. J., Sigh, A., and Zhou, Y. ConfValley: A Systematic Configuration Validation Framework for Cloud Services. In Proceedings of the 10th ACM European Conference in Computer Systems (EuroSys’15) (April 2015).
-  Keller, L., Upadhyaya, P., and Candea, G. ConfErr: A Tool for Assessing Resilience to Human Configuration Errors. In Proceedings of the 38th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’08) (June 2008).
-  Kendrick, S. What Takes Us Down? USENIX ;login: 37, 5 (October 2012), 37–45.
-  Kiciman, E., and Wang, Y.-M. Discovering Correctness Constraints for Self-Management of System Configuration. In Proceedings of the 1st International Conference on Autonomic Computing (ICAC’04) (May 2004).
-  Kushman, N., and Katabi, D. Enabling Configuration-Independent Automation by Non-Expert Users. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10) (October 2010).
-  Li, S., Li, W., Liao, X., Peng, S., Zhou, S., Jia, Z., and Wang, T. ConfVD: System Reactions Analysis and Evaluation Through Misconfiguration Injection. IEEE Transactions on Reliability, Early Access (September 2018).
-  Liao, X., Zhou, S., Li, S., Jia, Z., Liu, X., and He, H. Do You Really Know How to Configure Your Software? Configuration Constraints in Source Code May Help. IEEE Transactions on Reliability 67, 3 (September 2018), 832–846.
-  Maurer, B. Fail at Scale: Reliability in the Face of Rapid Change. Communications of the ACM 58, 11 (November 2015), 44–49.
Mickens, J., Szummer, M., and Narayanan, D.
Snitch: Interactive Decision Trees for Troubleshooting Misconfigurations.In Proceedings of the 2nd USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SYSML’07) (April 2007).
-  Moskowitz, A. Software Testing for Sysadmin Programs. USENIX ;login: 40, 2 (April 2015), 37–45.
-  Mukelabai, M., Nešić, D., Maro, S., Berger, T., and Steghöfer, J.-P. Tackling Combinatorial Explosion: A Study of Industrial Needs and Practices for Analyzing Highly Configurable Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18) (September 2018).
-  Nagaraja, K., Oliveira, F., Bianchini, R., Martin, R. P., and Nguyen, T. D. Understanding and Dealing with Operator Mistakes in Internet Services. In Proceedings of the 6th USENIX Conference on Operating Systems Design and Implementation (OSDI’04) (December 2004).
-  Narla, C., and Salas, D. Hermetic Servers. https://testing.googleblog.com/2012/10/hermetic-servers.html, October 2012.
-  Oppenheimer, D., Ganapathi, A., and Patterson, D. A. Why Do Internet Services Fail, and What Can Be Done About It? In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS’03) (March 2003).
-  Palatin, N., Leizarowitz, A., Schuster, A., and Wolff, R. Mining for Misconfigured Machines in Grid Systems. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06) (August 2006).
-  Perry, A., and Luebbe, M. Testing for Reliability. Site Reliability Engineering, Chapter 17, O’Reilly Media Inc. (April 2016), 183–204. https://landing.google.com/sre/sre-book/chapters/testing-reliability/.
-  Potharaju, R., Chan, J., Hu, L., Nita-Rotaru, C., Wang, M., Zhang, L., and Jain, N. ConfSeer: Leveraging Customer Support Knowledge Bases for Automated Misconfiguration Detection. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB’15) (August 2015).
-  Rabkin, A., and Katz, R. Precomputing Possible Configuration Error Diagnosis. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE’11) (November 2011).
-  Rabkin, A., and Katz, R. Static Extraction of Program Configuration Options. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11) (May 2011).
-  Rabkin, A., and Katz, R. How Hadoop Clusters Break. IEEE Software Magazine 30, 4 (July 2013), 88–94.
-  Santolucito, M., Zhai, E., Dhodapkar, R., Shim, A., and Piskac, R. Synthesizing Configuration File Specifications with Association Rule Learning. In Proceedings of 2017 ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’17) (October 2017).
-  Santolucito, M., Zhai, E., and Piskac, R. Probabilistic Automated Language Learning for Configuration Files. In Proceedings of the 28th International Conference on Computer Aided Verification (CAV’16) (July 2016).
-  Sayagh, M., Kerzazi, N., Adams, B., and Petrillo, F. Software Configuration Engineering in Practice: Interviews, Surveys, and Systematic Literature Review. IEEE Transactions on Software Engineering, Preprint (2018).
-  Sridharan, M., Fink, S. J., and Bodík, R. Thin Slicing. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI’07) (June 2007).
-  Su, Y.-Y., and Flinn, J. Automatically Generating Predicates and Solutions for Configuration Troubleshooting. In Proceedings of 2009 USENIX Annual Technical Conference (USENIX ATC’09) (June 2009).
-  Tang, C., Kooburat, T., Venkatachalam, P., Chander, A., Wen, Z., Narayanan, A., Dowell, P., and Karl, R. Holistic Configuration Management at Facebook. In Proceedings of the 25th ACM Symposium on Operating System Principles (SOSP’15) (October 2015).
-  Tuncer, O., Bila, N., Isci, C., and Coskun, A. K. ConfEx: An Analytics Framework for Text-Based Software Configurations in the Cloud. Tech. Rep. RC25675 (WAT1803-107), IBM Research, March 2018.
-  Wacker, M. Just Say No to More End-to-End Tests. https://testing.googleblog.com/2015/04/just-say-no-to-more-end-to-end-tests.html, April 2015.
-  Wang, H. J., Platt, J. C., Chen, Y., Zhang, R., and Wang, Y.-M. Automatic Misconfiguration Troubleshooting with PeerPressure. In Proceedings of the 6th USENIX Conference on Operating Systems Design and Implementation (OSDI’04) (December 2004).
-  Wang, S., Li, C., Hoffmann, H., Lu, S., Sentosa, W., and Kistijantoro, A. I. Understanding and Auto-Adjusting Performance-Sensitive Configurations. In Proceedings of the 23rd International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS’18) (March 2018).
-  Wang, Y.-M., Verbowski, C., Dunagan, J., Chen, Y., Wang, H. J., Yuan, C., and Zhang, Z. STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support. In Proceedings of the 17th Large Installation Systems Administration Conference (LISA’03) (October 2003).
-  Weiser, M. Program Slicing. In Proceedings of the 5th International Conference on Software Engineering (ICSE’81) (March 1981).
-  Whitaker, A., Cox, R. S., and Gribble, S. D. Configuration Debugging as Search: Finding the Needle in the Haystack. In Proceedings of the 6th USENIX Conference on Operating Systems Design and Implementation (OSDI’04) (December 2004).
-  Xu, T., Jin, L., Fan, X., Zhou, Y., Pasupathy, S., and Talwadker, R. Hey, You Have Given Me Too Many Knobs! Understanding and Dealing with Over-Designed Configuration in System Software. In Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’15) (August 2015).
-  Xu, T., Jin, X., Huang, P., Zhou, Y., Lu, S., Jin, L., and Pasupathy, S. Early Detection of Configuration Errors to Reduce Failure Damage. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16) (November 2016).
-  Xu, T., and Marinov, D. Mining Container Image Repositories for Software Configurations and Beyond. In In Proceedings of the 40th International Conference on Software Engineering (ICSE’18), New Ideas and Emerging Results (May 2018).
-  Xu, T., Naing, H. M., Lu, L., and Zhou, Y. How Do System Administrators Resolve Access-Denied Issues in the Real World? In Proceedings of the 35th Annual CHI Conference on Human Factors in Computing Systems (CHI’17) (Denver, CO, May 2017).
-  Xu, T., Zhang, J., Huang, P., Zheng, J., Sheng, T., Yuan, D., Zhou, Y., and Pasupathy, S. Do Not Blame Users for Misconfigurations. In Proceedings of the 24th ACM Symposium on Operating System Principles (SOSP’13) (November 2013).
-  Xu, T., and Zhou, Y. Systems Approaches to Tackling Configuration Errors: A Survey. ACM Computing Surveys 47, 4 (July 2015).
-  Yin, Z., Ma, X., Zheng, J., Zhou, Y., Bairavasundaram, L. N., and Pasupathy, S. An Empirical Study on Configuration Errors in Commercial and Open Source Systems. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11) (October 2011).
-  Yuan, C., Lao, N., Wen, J.-R., Li, J., Zhang, Z., Wang, Y.-M., and Ma, W.-Y. Automated Known Problem Diagnosis with Event Traces. In Proceedings of the 1st ACM European Conference on Computer Systems (EuroSys’06) (April 2006).
-  Yuan, D., Xie, Y., Panigrahy, R., Yang, J., Verbowski, C., and Kumar, A. Context-based Online Configuration Error Detection. In Proceedings of 2011 USENIX Annual Technical Conference (USENIX ATC’11) (June 2011).
-  Zhang, J., Renganarayana, L., Zhang, X., Ge, N., Bala, V., Xu, T., and Zhou, Y. EnCore: Exploiting System Environment and Correlation Information for Misconfiguration Detection. In Proceedings of the 19th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS’14) (March 2014).
-  Zhang, S., and Ernst, M. D. Automated Diagnosis of Software Configuration Errors. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13) (May 2013).
-  Zhang, S., and Ernst, M. D. Which Configuration Option Should I Change? In Proceedings of the 36th International Conference on Software Engineering (ICSE’14) (May 2014).
-  Zhang, S., and Ernst, M. D. Proactive Detection of Inadequate Diagnostic Messages for Software Configuration Errors. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA’15) (Baltimore, MD, July 2015).
-  Zhu, H., Hall, P. A. V., and May, J. H. R. Software Unit Test Coverage and Adequacy. ACM Computing Surveys 29, 4 (December 1997), 366–427.