Where Are The Gaps? A Systematic Mapping Study of Infrastructure as Code Research

07/13/2018 ∙ by Akond Rahman, et al. ∙ 0

Context:Infrastructure as code (IaC) is the practice to automatically configure system dependencies and to provision local and remote instances. Practitioners consider IaC as a fundamental pillar to implement DevOps practices, which helps them to rapidly deliver software and services to end-users. Information technology (IT) organizations, such as Github, Mozilla, Facebook, Google and Netflix have adopted IaC. A systematic mapping study on existing IaC research can help researchers to identify potential research areas related to IaC, for example, the areas of defects and security flaws that may occur in IaC scripts. Objective: The objective of this paper is to help researchers identify research areas related to infrastructure as code (IaC) by conducting a systematic mapping study of IaC-related research. Methodology: We conduct our research study by searching six scholar databases. We collect a set of 33,887 publications by using seven search strings. By systematically applying inclusion and exclusion criteria, we identify 31 publications related to IaC. We identify topics addressed in these publications by applying qualitative analysis. Results: We identify four topics studied in IaC-related publications: (i) framework/tool for infrastructure as code; (ii) use of infrastructure as code; (iii) empirical study related to infrastructure as code; and (iv) testing in infrastructure as code. According to our analysis, 52 practice of IaC or extend the functionality of an existing IaC tool. Conclusion: As defects and security flaws can have serious consequences for the deployment and development environments in DevOps, along with other topics, we observe the need for research studies that will study defects and security flaws for IaC.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Infrastructure as code (IaC) is the practice to automatically configure system dependencies and to provision local and remote instances Humble:2010:CD . Use of IaC scripts is essential to the implementation of the practice of automated deployment, such as is done with a continuous deployment process. Popular IaC technologies, such as Chef 111https://www.chef.io/chef/ and Puppet 222https://puppet.com/, provide utilities to automatically configure and provision software deployment infrastructure using cloud instances. Information technology (IT) organizations such as, Ambit Energy ambit:pup , Github 333https://speakerdeck.com/kpaulisse/puppetconf-2016-scaling-puppet-and-puppet-culture-at-github, Mozilla cd:adage:parnin , and Netflix cd:adage:parnin use these utilities to provision cloud-based instances, such as Amazon Web Services (AWS) 444https://aws.amazon.com/, managing databases, and managing user accounts both on local and remote computing instances. For example, Puppet provides the ‘sshkey resource’ to install and manage secure shell (SSH) host keys and the ‘service resource’ to manage software services automatically puppet-doc . Use of IaC scripts has helped IT organizations to increase their deployment frequency. For example, Ambit Energy, uses IaC scripts to increased their deployment frequency by a factor of 1,200 ambit:pup .

Interest in the practice of IaC have grown amongst both: practitioners cd:adage:parnin and researchers JiangAdamsMSR2015  SharmaPuppet2016 . As shown in Figure 1, Google Trend 555https://trends.google.com/trends/explore?date=all&q=Infrastructure%20as%20Code data related to the search term ‘Infrastructure as Code’, provides further evidence on how IaC as a topic has a growing interest. The x-axis presents months, and the y-axis presents the ‘Interest Over Time’ metric determined by Google Trends. According to Figure 1 interest in IaC has increased steadily after 2015.

Even though interest in IaC is growing steadily, the current state of IaC research remains under-explored. A summary of existing literature in a particular research domain can help researchers to get an overview of the particular domain, and identify potential research topics that could benefit from systematic investigation. One strategy to summarize existing literature for a particular research domain is to conduct a systematic mapping study sms:feldt . Through a systematic mapping study, researchers can identify gaps, and can group existing research for a certain domain sms:feldt . The identified gaps can potentially direct future research in that particular domain KITCHENHAM:IMP:SMS . Researchers have conducted systematic mapping studies in numerous domains of software engineering, for example, in the domain of technical debt debt:sms , testing testcode:sms  test:effort:sms , and software visualization vis:val:sms . Despite growing interest in IaC, we observe limited evidence of systematic mapping studies that have been conducted in the domain of IaC. We conduct a systematic mapping study in the domain of IaC that can be beneficial in two ways: (i) identify what research problems have already been addressed in the domain of IaC; and (ii) identify research problems that could benefit from further research.

Figure 1: Interest in IaC as a search topic since 2004 based on Google Trends data. Interest in IaC has steadily increased since 2015.

The objective of this paper is to help researchers identify research areas related to infrastructure as code (IaC) by conducting a systematic mapping study of IaC-related research.

We answer the following research questions:

  • RQ1: What topics have been studied in infrastructure as code (IaC)-related publications?

  • RQ2: What are the temporal publication trends for infrastructure as code (IaC)-related research topics?

  • RQ3: What are the temporal trends for the use of infrastructure as code (IaC)-related tools, as mentioned in IaC-related publications?

We follow Petersen et al. sms:feldt ’s guidelines, and conduct a systematic mapping study to identify which research topics are being studied in the domain of IaC. First, we search six scholar databases namely, IEEE Xplore 666http://ieeexplore.ieee.org/Xplore/home.jsp, ACM Digital Library 777https://dl.acm.org/, the IET Digital Library 888http://digital-library.theiet.org/, Springer Link 999https://link.springer.com/, ScienceDirect 101010https://www.sciencedirect.com/, and Wiley Online Library 111111https://onlinelibrary.wiley.com/. Using seven search strings, we obtain a set of 33,887 publications. By systematically applying inclusion and exclusion criteria kitchen:guide:swe , we obtain 31 IaC-related publications. We follow Kitchenham’s guidelines Kitchenham:Quality to assess the quality of our set of 31 publications. We apply qualitative analysis qual:coding:book to generate topics from the content of the collected publications. Next, we investigate the overall and topic-wise temporal trends of the collected IaC-related publications. We also characterize the temporal trends of the use of IaC-related tools in our set of 31 publications.

Contributions: We list our contributions as following:

  • A list of topics studied in IaC-related publications;

  • An evaluation of the temporal trends for IaC-related publications; and

  • An evaluation of the quality of IaC-related publications.

We organize rest of the paper as following: in Section 2 we describe necessary background and related academic publications. We provide our methodology in Section 3. We provide our findings in Section 4, and discuss possible implications of findings in Section 5. We list the limitations of our systematic mapping study in Section 6. Finally, we conclude our paper in Section 7.

2 Background and Related Work

In this section we first provide a brief background on IaC and systematic mapping studies. Then we describe related academic publications.

2.1 Background

In this section, we provide background on IaC and systematic mapping studies.

2.1.1 Background on Infrastructure as Code (IaC)

Practitioners attribute the concept of infrastructure as code to Chad Fowler, in his blog published in 2013 121212https://www.oreilly.com/ideas/an-introduction-to-immutable-infrastructure. The phrase ‘as code’ in IaC corresponds to applying traditional software engineering practices, such as code review and version control for IaC scripts cd:adage:parnin  Humble:2010:CD . To automatically provision infrastructure, programmers follow specific syntax, and write configurations in a similar manner as software source code. IaC scripts use domain specific language (DSL) ShambaughRehearsal2016 . Organizations that implement DevOps practices widely use commercial tools, such as Puppet, to implement IaC Humble:2010:CD  JiangAdamsMSR2015  ShambaughRehearsal2016 . IaC scripts are also known as ‘configuration as code’ scripts SharmaPuppet2016  Humble:2010:CD .

We describe the typical work flow of IaC development as following: programmers make changes to the required IaC scripts and submit them to a version control system such as Git 131313https://git-scm.com/. Once changes are submitted, a build in the continuous integration (CI) tool, (e.g. Travis CI) is triggered. The CI tool runs the static analysis checks and test cases specified by the development team. If all the static analysis checks and tests pass, the CI tool integrates all the changes.

2.1.2 Background on Systematic Mapping Studies

A systematic mapping study provides a ‘map’ or an overview of a research area by (i) classifying papers and results based on relevant categories and (ii) counting the frequency of work in each of those categories. The output of a systematic mapping study is to identify the coverage of research studies in a particular area 

sms:feldt . Systematic mapping studies can be beneficial in identifying research studies relevant to that topics sms:feldt . Systematic mapping studies are different from systematic literature reviews (SLRs) sms:feldt , because unlike SLRs, systematic mapping studies are exploratory in nature, whereas, the purpose of SLRs is to provide a synthesized summaries to answer well-defined research questions slr:sms:esem . Systematic mapping studies have importance as these studies provide a basis for future research KITCHENHAM:IMP:SMS .

2.2 Related Work

Our systematic mapping study is closely related to research studies on IaC, and prior research work that have conducted systematic mapping studies in other areas of software engineering. We briefly describe both in the following subsections:

2.2.1 Prior Research on IaC

Our paper is related to empirical studies that have focused on IaC technologies, such as Puppet. Sharma et al. SharmaPuppet2016 investigated anti-patterns in IaC scripts and proposed 13 implementation and 11 design anti-patterns. Hanappi et al. Hanappi:2016:pupp:converge investigated how convergence of Puppet scripts can be automatically tested and proposed an automated model-based test framework. Jiang and Adams JiangAdamsMSR2015 investigated the co-evolution of IaC scripts and other software artifacts, such as build files and source code. They reported IaC scripts to experience frequent churn. Ikeshita et al. Ikeshita:IaC:Reduction proposed and evaluated a framework to reduce test suites for IaC. Weiss et al. Weiss:Tortoise proposed and evaluated ‘Tortoise’, a tool that automatically corrects erroneous configurations in IaC scripts. Hummer at al. Hummer:IaC proposed a framework to enable automated testing of IaC scripts.

We observe that researchers have a growing interest in the field of IaC. We take motivation from this observation, and conduct a systematic mapping study of IaC research in this paper.

2.2.2 Prior Research on Systematic Mapping Studies

The use of systematic mapping studies is common in software engineering, for example in the domain of technical debt, domain specific languages, and software requirements. Li et al. debt:sms conducted a systematic mapping study with 94 publications related to technical debt management and observed the necessity of dedicated technical debt management tools in software engineering. Kosar et al. dsl:sms conducted a systematic mapping study with 390 publications related to domain specific languages (DSLs) and reported that the DSL community focuses more on the development of new techniques, instead of evaluating the effectiveness of the proposed DSL techniques. Novais et al. evol:vis studied 125 papers related to software evolution visualization and observed a lack of empirical research in the area of software evolution visualization. Jalali and Wohlin global:sms studied 77 papers related to the adoption of agile practices in global software engineering and reported that in majority of the papers agile practices were modified with respect to the context and situational requirements. Kitchenham metrics:sms studied 100 software metric-related publications and observed that empirical validation is a key focus of software metrics-related papers. Condori-Fernandez et al. softevol:sms reviewed 46 publications related to software requirement specification, and reported that understandability is the most commonly evaluated aspect of software requirement specification studies. Engstrom and Runeson spl:sms studied 64 publications on software product line testing, and advocated for stronger validation research methods to provide a better foundation for software product line testing. Paternoster et al. startup:sms extracted 213 software engineering practices from 43 publications related to software start-ups and reported that in software start-ups, software engineering work practices are chosen opportunistically, which are later adapted and configured. Elberzhager et al. test:effort:sms studied 144 publications on reducing software testing efforts, and reported that researchers have focused more in the area of automation and prediction approaches. Yusifoglu et al. testcode:sms studied 60 publications on software test code engineering and observed that the two leading avenues of research in the area of software test code engineering are tools and methods. Seriai et al. vis:val:sms studied 87 publications related to validation of software visualization tools and observed the lack of maturity in validation of software visualization tools. Riaz et al. maria:sms studied 30 publication of software patterns and observed that software patterns in maintenance is the most commonly investigated domain in the research field of software patterns.

The above-mentioned prior work illustrates the usage of systematic mapping studies in several areas of software engineering. We take motivation from these studies, and conduct a systematic mapping study in the area of IaC. Through our systematic mapping study we aim to identify the research areas that need attention in the field of IaC.

3 Methodology

We conduct a systematic mapping study following the guidelines of Petersen et al. sms:feldt . In this section, we describe the methodology to conduct our systematic mapping study. The methodology is divided into four phases, which we describe in the following subsections:

3.1 Phase One: Search

The first phase of finding IaC-related publications is to search the scholar databases. For our paper, we select six scholar databases following Kuhrmann et al. emse:slr:guide ’s guidelines. These six scholar databases are: Institute of Electrical and Electronics Engineers (IEEE) Xplore 141414http://ieeexplore.ieee.org/Xplore/home.jsp, Association for Computing Machinery (ACM) Digital Library 151515https://dl.acm.org/, the Institution of Engineering and Technology (IET) Digital Library 161616http://digital-library.theiet.org/, Springer Link 171717https://link.springer.com/, ScienceDirect 181818https://www.sciencedirect.com/, and Wiley Online Library 191919https://onlinelibrary.wiley.com/. We select these six scholar databases as these databases are recommended for conducting systematic mapping studies and literature reviews emse:slr:guide .

For searching the scholar databases, we construct a set of search strings. The construction process can be described as follows:

  • Step-1: First, we perform an exploratory search in Google Scholar, using the string “infrastructure as code”. We start with the string “infrastructure as code”, as infrastructure as code (IaC) is the topic on which we conduct our systematic mapping study. Based on the search results, we observe that the string ‘infrastructure’ can also refer to infrastructure in other disciplines such as civil engineering. Therefore, to limit our search scope in the area of IaC we added the string ‘software engineering’, using which we derived the search string “infrastructure as code AND software engineering”.

  • Step-2: From the search results of Step-1, we observe that “configuration as code”, is also used for “infrastructure as code” as a synonym SharmaPuppet2016 . Similar to the search term “infrastructure as code AND software engineering”, we also add the search string “configuration as code AND software engineering”. IaC scripts are also referred to as configuration scripts Humble:2010:CD , so we created another search string “configuration script AND software engineering”.

  • Step-3: From the top five search results obtained from Step-1 and 2 we observe that publications that study IaC also use the keywords ‘devops’, and ‘Puppet’. Therefore, as the third search string we use “devops AND puppet”. As Ansible, CFEngine, and Chef are commonly used tools to implement IaC SharmaPuppet2016 , we also include three more search strings: “devops AND ansible”, “devops AND chef”, and “devops AND cfengine”. We do not consider ‘devops’ as a search string, as this search string can yield search results that are applicable for DevOps only, such definitions and best practices of DevOps.

Altogether, we obtain the following seven search strings:

  • “infrastructure as code AND software engineering”

  • “configuration as code AND software engineering”

  • “configuration script AND software engineering”

  • “devops AND ‘puppet”

  • “devops AND ‘ansible”

  • “devops AND ‘chef”

  • “devops AND ‘cfengine”

We search each of the six scholar databases using the above-mentioned search strings. Our search process will result in a collection of publications that we filter using an inclusion and exclusion criteria, described in Section 3.2.

Quasi-Gold Set

: We use seven search strings in our search process. These search strings may yield search results that do not include IaC-related publications, which motivates us to validate the derived search strings. We validate our set of search strings by applying the ‘quasi-sensitivity’ metric proposed by Zhang and Babar Zhang:Babar:SLR . The quasi-sensitivity (QS) approach validates if our set of search strings are sufficient to identify IaC-related publications. The QS metric requires a ‘quasi-gold’ set of publications, which we identify as following:

  • First, we identify peer-reviewed publications that cite any of the following literature: ‘Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation’ Humble:2010:CD , ‘Pro Puppet’ ProPuppet:Book , ‘Infrastructure as Code: Managing Servers in the Cloud’ kief:iac:book , and ‘DevOps for Developers’ devops:hatterman . These publications discuss in details on how to implement the practices of DevOps and continuous deployment. As IaC is one of the fundamental pillars to implement continuous deployment and DevOps Humble:2010:CD , our assumption is that peer-reviewed publications that cite any of these books can be potentially relevant to conduct a systematic mapping study for IaC.

  • Second, we exclude publications that are not peer-reviewed, and not written in English.

  • Third, we exclude publications that are not related to IaC by reading the titles of the collected publications. If we are unable to determine from the title, we read the publication completely. We use two raters to mitigate the subjectivity. The first and second author separately conducted this step. Upon completion, the agreements and Cohen’s Kappa score cohens:kappa are recorded. The disagreements are resolved upon discussion.

    After completing this step we obtain a set of quasi-gold set of publications for our systematic mapping study.

We calculate the quasi-sensitivity metric (QSM) using Equation 1. As a hypothetical example, if the total count of IaC-related publications in the quasi-gold set that is obtained using our search strings is 9, and the count of publications in our quasi-gold set is 10, then the quasi-sensitive score is 0.9.


3.2 Phase Two: Inclusion and Exclusion Criteria

Search results obtained from using our search strings on the six databases contain irrelevant results that are out of scope for our research study. We filter those results using the following inclusion and exclusion criteria:

  • Exclusion Criteria:

    • Publications that are not peer-reviewed, for example, books

    • Publications are published before 2000. IaC-related concepts such as DevOps, continuous delivery, continuous deployment, and continuous integration are first introduced after 2000, and have gained in popularity since then. By selecting publications published on or after 2000, we assume to collect IaC-related publications needed for the systematic mapping study.

  • Inclusion Criteria:

    • Publications must be written in English

    • Publications must be available for download

    • Title, Keywords, Abstract, and Introduction of the paper make it explicit that the paper is related to IaC

Upon applying the inclusion and exclusion criteria, we will obtain a set of publications that we use for our analysis. Before answering the RQs using our set of publications, we perform quality analysis to assess the quality of these publications, as described in Section 3.3.

3.3 Phase Three: Quality Analysis

Kitchenham et al. Kitchenham:Quality proposed a set of criteria to evaluate the quality of software engineering publications. In their study, they used this criteria to assess if the quality of software engineering publications are increasing or decreasing as time progresses. A publication’s higher quality score indicates that the publication of interest has stated their objectives clearly, has actionable findings, has discussed the limitations, and has clear presentation structure. We used Kitchenham et al. Kitchenham:Quality ’s criteria set to assess the quality of our set of publications related to IaC:

  • Q1 (Aim): Do the authors clearly state the aim of the research?

  • Q2 (Units): Do the authors describe the sample and experimental units?

  • Q3 (Design): Do the authors describe the design of the experiment?

  • Q4 (Data Collection): Do the authors describe the data collection procedures and define the measures?

  • Q5 (Data Analysis): Do the authors define the data analysis procedures?

  • Q6 (Bias): Do the authors discuss potential experimenter bias?

  • Q7 (Limitations): Do the authors discuss the limitations of their study?

  • Q8 (Clarity): Do the authors state the findings clearly?

  • Q9 (Usefulness): Is there evidence that the Experiment/Quasi-Experiment can be used by other researchers/practitioners?

Based on the answers to each of the above-mentioned nine questions, a rater provides a score: 1 (not at all); 2 (somewhat); 3 (mostly); 4 (fully). A higher score for each of this question, indicates that the authors of the paper have provided detailed descriptions, which can be helpful in replications and sound analysis Kitchenham:Quality  maria:sms . As this process involves subjectivity, we use two raters who independently rated each question for each publication. We report the average score for each question and for each publication.

Upon completion of this step, we obtain an assessment of quality for the collected publications that we use to answer our RQs.

Threats Reported in IaC-related Publications

: When conducting research studies, validity threats may arise that either need to be accounted for or acknowledged as potential limitations. Explicit reporting of threats or limitations is indicative of high quality for an academic publication Burcham:hotsos2017  Kitchenham:Quality . Furthermore, such investigation can guide future researchers to be aware of the what types of threats can occur if IaC-related research is conducted for certain topics. For each paper, we identify what types of threats have been reported using Wohlin et al. wohlin:ese ’s four categories of validity threats:

  • Conclusion Validity: Conclusion validity evaluates to which extent researchers have drawn conclusions from their analysis without violating statistical assumption and maintaining sufficient statistical power wohlin:ese .

  • Internal Validity: Internal validity evaluates to which extent researchers can make causal inferences from their empirical study wohlin:ese .

  • Construct Validity: Construct validity evaluates to which extent the experiment is measuring, what it is designed to measure wohlin:ese .

  • External Validity: External validity measures generalizability i.e. to which extent the reported results in the publication can be generalized in other contexts wohlin:ese .

We do not make judgments about threats in the research that have not been reported by the authors.

3.4 Answer to Research Questions

We describe the methodology to answer the three research questions as following:

3.4.1 Answer to RQ1: What topics have been studied in infrastructure as code (IaC)-related publications?

In RQ1 we focus on identifying the topics, which summarize the research avenues pursued in IaC-related publications.

Answering RQ1 involves identifying topics that emerge from the IaC-related publications of interest. Each rater extracted sentences from the publication that convey important information about the topic of the publication (deemed “raw text”). Each rater applied qualitative analysis qual:coding:book to extract the topics of the sentences as verbatim phrases (deemed “initial code”). These initial codes are abstracted to “topics” based upon commonalities observed in initial codes.

We use Figure 2 to illustrative our qualitative coding process. We first start with the extraction of raw text from a publication. Next, from the extracted ‘Raw Text’ we derive initial codes. As demonstrated in Figure 2, from the raw text ‘Detailed test reports are created at the end of a test suite, which facilitate tracking down the root cause of failures and issues of non-idempotence’, we extract four initial codes: ‘test suite’, ‘test report’, ‘failures’, and ‘non-idempotence’. Finally, we generate the topic ‘Testing’ from the four initial codes.

The process of generating topics is subjective, which we account for by deploying two raters. Two raters independently generate the topics from the collected publications. Two topic names that were determined to be synonyms were counted as an agreement. The disagreements are resolved upon discussion. Upon completion, we measure the agreement level on the generated topics, and the Cohen’s Kappa score cohens:kappa is recorded.

Figure 2: An example of how we use qualitative coding to generate topics from the set of IaC-related publications.

Answers to RQ1 will provide a list of topics that are studied in IaC-related publications. Each publication in our publication set can relate to more than one of the identified topics.

3.4.2 Answer to RQ2: What are the temporal publication trends for infrastructure as code (IaC)-related research topics?

We answer RQ2 using two approaches: first, we compute the overall trend of IaC-related publications by calculating how many publications are published in each year since 2000, related to IaC. Second, we compute the temporal trends exhibited for each identified topic. We accomplish this step by calculating the count of publications that belong to each topic are published each year. By using these two approaches we get two categories temporal of trends (i) an overall trend; and (ii) temporal trends of IaC-related publications per topic.

3.4.3 Answer to RQ3: What are the temporal trends for the use of infrastructure as code (IaC)-related tools, as mentioned in IaC-related publications?

The focus of RQ3 is to investigate what IaC tools are used to conduct IaC-related research. We take motivation from prior research that have conducted mapping studies to understand tool usage in other domains such as testing sms:testing:tool and global software engineering sms:global:tool . By answering RQ3 we can get insights on what types of tools have been reported in prior IaC-related publications and the corresponding task the tool have been used to accomplish. Let us consider the example of empirical studies. As a hypothetical example, let us assume that our analysis shows Puppet scripts to be used for conducting empirical analysis. Such information reveals the availability of Puppet scripts along with the repository sources (e.g. GitHub), and can be helpful for researchers who are interested in conducting empirical studies related to IaC.

We answer RQ3 by analyzing each publication of our set. We determine a publication to use a tool , if any of the following criteria is satisfied:

  • is used to implement a framework or methodology;

  • is used to provision a system;

  • scripts from is used to conduct an empirical study; or

  • scripts from is used to validate a proposed framework or methodology

4 Results

In this section, we first provide the count of publications that we derive using our search process, along with the publications that belong to our quasi-gold set. Next, we provide answers to the three RQs in the following sub-sections.

In Table 1, we report the count of publications for each scholar database. Altogether, we obtain 33,887 publications. We collect all these publications on December 30, 2017. From this set of 33,887 publications we remove duplicates, and separate out 14,015 publications. Next, we filter out publications that are written in English, and identify 10,567 publications. Finally, we remove 727 publications that are not peer-reviewed. After removing 727 publications we obtain our set of 9,840 publications. All the 9,840 publications are accessible and available for download.

Quasi-Gold Set

: Altogether, we identify 11 IaC-related publications that belong to the quasi-gold set. The first author and second author respectively identify 9 and 11 publications nine of which were common between the two authors. Both authors agreed upon nine publications identified by the first author. The recorded Cohen’s Kappa is 0.2. According to Landis and Koch Landis:Koch:Kappa:Range , the agreement level is ‘fair’. The first and second author resolve their disagreements by discussing their ratings and contents on the disagreed publications. Upon discussion between the first and second authors, two more publications are added to the set of nine publications identified by the first author. Using Equation 1 reported in Section 3.1, we record a QS score of 1.0, for the collected publications. According to our QS score, using our set of search strings, we identify all publications in our quasi-gold set. The list of publications included in our quasi-gold set is available in Table 1 of Appendix. As an example, the publication ‘Cloud WorkBench: Benchmarking IaaS Providers based on Infrastructure-as-Code’, is the first publication in our quasi-gold set and labeled as ‘QG1’.

Scholar Database Count
ACM Digital Library 420
IEEE Xplore 6,167
ScienceDirect 5,099
Springer Link 16,019
Wiley Online 3,793
IET 2,389
Table 1: Search Results for Scholar Databases
Index Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
Aim Units Design Data Collection Data Analysis Bias Limitations Clarity Usefulness
S1 4.0 3.5 4.0 3.5 3.0 1.5 2.0 3.0 4.0
S2 4.0 4.0 4.0 3.5 4.0 3.5 2.5 4.0 3.5
S3 2.5 1.5 2.0 1.0 1.0 1.0 1.0 2.5 4.0
S4 3.0 4.0 4.0 4.0 3.5 1.5 2.5 4.0 4.0
S5 3.5 4.0 4.0 4.0 4.0 2.0 2.5 4.0 4.0
S6 3.0 4.0 4.0 3.5 4.0 1.5 2.0 4.0 4.0
S7 2.5 1.0 2.0 1.0 1.0 1.0 1.0 3.0 3.5
S8 4.0 4.0 4.0 4.0 4.0 1.5 4.0 4.0 4.0
S9 4.0 3.0 3.5 2.5 3.0 1.5 2.0 3.0 4.0
S10 4.0 4.0 4.0 3.5 4.0 2.0 4.0 4.0 4.0
S11 2.5 1.0 2.0 1.0 1.0 1.0 1.0 1.5 1.5
S12 3.0 3.5 3.5 3.5 3.5 1.0 1.0 2.5 3.0
S13 3.0 2.5 1.5 1.5 1.5 1.0 1.0 3.5 3.5
S14 2.5 1.0 1.0 1.0 1.0 1.0 1.0 2.5 2.5
S15 3.0 2.5 2.5 2.5 2.0 1.0 1.0 3.0 3.0
S16 3.0 2.5 2.0 1.5 1.5 1.0 1.5 3.0 2.5
S17 3.0 2.5 2.5 1.5 2.5 1.0 1.5 3.0 3.0
S18 3.0 3.5 3.5 3.0 3.5 1.0 3.0 3.5 3.5
S19 3.0 2.5 3.5 3.5 3.0 1.5 3.5 4.0 3.0
S20 3.0 1.5 3.5 1.5 1.5 1.0 1.5 3.5 3.0
S21 3.0 4.0 4.0 3.5 4.0 1.5 1.5 4.0 4.0
S22 3.0 2.5 2.5 2.0 2.0 1.5 2.0 3.0 3.0
S23 2.5 3.5 2.0 3.0 3.5 1.5 1.5 3.5 2.5
S24 3.0 4.0 4.0 3.0 3.0 1.5 4.0 4.0 4.0
S25 3.0 3.5 4.0 3.5 3.0 1.0 1.0 3.5 3.5
S26 3.0 3.0 4.0 2.0 2.0 1.5 1.5 3.0 3.0
S27 4.0 2.5 3.0 2.0 2.0 1.0 1.5 3.5 3.0
S28 3.0 2.0 3.0 1.5 1.5 1.0 1.0 2.5 3.0
S29 3.0 2.0 3.0 1.5 2.5 1.0 2.5 3.0 2.5
S30 3.0 2.5 3.5 2.5 3.0 1.0 1.5 3.5 3.0
S31 3.0 2.5 3.5 2.5 2.5 1.0 1.5 3.0 3.0
Avg. 3.1 2.8 3.1 2.5 2.6 1.3 1.9 3.3 3.3
Table 2: Quality Assessment of the 31 Publications

From the collected set of 9,840 publications, we determine if each of the publications are related to IaC. The first and second author individually complete this step to determine IaC-related publications. Both, the first and second author first read the titles of each of the 9,840 publications to determine if the publication related to IaC, identifying 85 and 98 publications, respectively. Next, by reading the abstract and the introduction of these publications, the first and second author respectively, identifies 36 and 39 publications, 26 of which were in common between the two authors. We record a Cohen’s Kappa of 0.81. According to Landis and Koch Landis:Koch:Kappa:Range the agreement level is ‘almost perfect’. The first and second author resolve their disagreements by discussing on their ratings and contents on the disagreed publications.

After resolving disagreements between the first and second authors, we identify a set of 31 publications which we use to answer our three RQs. Each of the publications’ names are listed in Table 2 of Appendix. We index each publications as ‘S#’, for example the index ‘S1’ refers to the publication ‘Cloud WorkBench: Benchmarking IaaS Providers Based on Infrastructure-as-Code’. We acknowledge that this set of publications related to IaC is small. One possible reason can be attributed to the availability of artifacts: adoption of IaC is yet to become wide-spread to facilitate more research in the area of IaC.

We also evaluate the publications’ quality using the guidelines provided by Kitchenham et al. Kitchenham:Quality . We report our findings in Table 2. Each cell in the the Table corresponds to the average of the quality score determined by the two raters who are the first and second authors of the paper. For example, publication S1, has a quality score of 4.0 for the quality criteria Q1. Each quality criteria is followed by the theme of each quality criteria, as stated in Section 3.3. For example, the quality criteria Q1 is related to the criterion of a publication’s aim or goal being clearly stated. In Table 2, we report the average of scores for all 31 publications for each quality criteria in the ‘Avg.’ row. The cells highlighted in bold indicate scores for a publication which has a score higher than the average for the quality criteria. For example, S1 has a higher score than that of the average score of all 31 publications for quality criteria Q1.

For four quality checks, Q1, Q3, Q8, and Q9, the average score is higher than that of 3.0, which implies that our set of publications satisfy the checks of clearly stating aim of the publication; describing the design of the experiment; clearly stating findings; and identifying findings that are actionable for other researchers and practitioners. The average score is between 2.0 and 3.0 for quality checks Q2, Q4, and Q5. These three quality checks, respectively, presents the quality criterion of describing sample and experimental units; describing data collection procedures; and defining the data analysis procedures. Clear description of data collection and data analysis procedures can help in replicating research studies and in advancing the field of research in the area of IaC. Based on our findings, we recommend researchers who will conduct IaC-related research, to clearly define and describe their data collection and data analysis procedures.

The scores are less than 2.0 for two quality checks Q6 and Q7 that, respectively, corresponds to discussion of potential experimental bias and to discussion of threats in the publication. Based on Kitchenham et al.’s guidelines Kitchenham:Quality  Kitchenham:Guideline:SWE , research publications should clearly report potential experimenter bias, and the threats that are related to the research study. Future research studies can take our findings into account while conducting IaC-related research, and report the limitations and potential bias that may occur while conducting their research studies.

In summary, our findings indicate that publications related to IaC can have actionable findings/suggestions for practitioners and researchers but lack necessary quality checks needed for proper and complete presentation of their findings. Based on our findings, we recommend researchers to report their IaC-related research findings by following the best practices suggested by Kitchenham et al. Kitchenham:Quality .

Threats Reported in IaC-related Publications

: We also summarize which threats are reported in IaC-related publications. Altogether, we have considered four categories of threats: construct validity, conclusion validity, internal, and external validity. We observe that of the 31 publications, only 7 (22.5%) explicitly report the publication’s threat or limitations. A complete mapping between each publication and the reported threat categories for these seven publications is available in Table 3. In each cell we report if a category of threats is reported in a publication. For example, we observe that no Conclusion Validity was reported in S2.

Our findings suggest that IaC-related publications do not report the threats of their research studies adequately. We advocate for better reporting of research threats in IaC-related publications, following the guidelines of Wohlin et al. wohlin:ese .

Topic Conclusion Construction Internal External
S2 N Y Y Y
S5 N Y N Y
S8 N Y Y N
S9 N Y N N
S10 Y Y Y Y
S18 N Y N Y
S24 N N N Y
Table 3: Reported Threats for Each Publication

4.1 Answer to RQ1: What topics have been studied in infrastructure as code (IaC)-related publications?

We identify the topics that have been researched in the area of IaC by applying qualitative analysis. Through our qualitative analysis, we identify four topics. A publication can belong to multiple topics implying that the identified topics are not orthogonal to each other. The topics are: (1) Framework/Tool for infrastructure as code (Framework/Tool); (2) Use of infrastructure as code (Use of IaC); (3) Empirical study related to infrastructure as code (Empirical); and (4) Testing in infrastructure as code (Testing).

A complete mapping between each of the 31 publication and their corresponding topic is available in Table 4. We describe each topic, along with the count of publications for each topic as following:

Topic Publication
Framework/Tool S6, S10, S12, S13, S15, S18, S19, S20, S21, S22, S24, S25, S26, S29, S30, S31
Use of IaC S1, S3, S7, S9, S14, S16, S17, S18, S23, S27, S28
Empirical Study S2, S4, S5, S8, S10, S11, S23
Testing S4, S5, S6, S11
Table 4: Mapping Between Each Topic and Publication
  • Framework/Tool for infrastructure as code (16): The most frequently studied topic in IaC-related publications is related to framework or tools. In these publications, authors propose a framework or a tool either to implement the practice of IaC or extend a functionality of IaC. We describe a few publications related to ‘Framework/Tool for IaC’ briefly:

    Authors in S12 observed that a wide variety of reusable DevOps artifacts such as Chef cookbooks and Puppet modules are shared, but these artifacts are usually bound to specific tools. The authors proposed a novel framework that generates standard Topology and Orchestration Specification for Cloud Applications 202020https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tosca(TOSCA)-based DevOps artifacts to consolidate DevOps artifacts from different sources. Later, authors of S12 extend their work in S19, where they constructed a run-time framework using a open source tool-chain to support integration for a variety of DevOps artifacts. In S22 authors propose a the hidden master framework to assess the survivability of IaC scripts, when they are under attack. In S24 proposes a tool called ConfigValidator that validates IaC artifacts such as Docker images, by a writing rules to detect configurations. In S10, authors propose and evaluate Tortoise, which fixes configurations in Puppet scripts automatically. In S20, ‘Charon’ is proposed to implement the practice of IaC.

    Existing tools can be limiting, which may be motivating researchers to propose framework or tools that mitigate these limitations. For example, in S20, the authors observed that existing commercial IaC tools make assumptions on configuration models, which may not suit the purpose of all IT organizations. Authors of publication S20 proposed Charon, a tool to implement IaC to mitigate this limitation.

  • Use of infrastructure as code (11): Publications that relate to this topic discusses how IaC can be used in different domains of software engineering, such as monitoring of system and automated deployment of enterprise applications. We describe the publications that related to this topic briefly as following:

    S1 uses IaC to build a benchmark tool to assess the performance of cloud applications. Authors in S3 and S14 discusses how IaC can be used to implement DevOps. S7 focuses on how Ansible can be used to automatically provision an enterprise application. In S9, authors investigated the feasibility of using Puppet modules to deploy a software-as-a-service (SaaS) application. They observed that Puppet modules are adequate for provisioning SaaS applications, but comes with an extra layer of complexity. In S17 the authors propose ‘DevOpsLang’ that uses Chef to automatically deploy a chat application. Authors in S18 proposes the ABS Modeling Language that uses IaC to deploy an e-commerce application. In S23, authors interview practitioners from 10 companies on the use of IaC in continuous deployment. The authors reported that IaC scripts have fundamentally changed how IT organizations are managing their servers using IaC. They also reported that similar to software code base IaC code bases churn frequently. Authors in S27 proposes Omnia that uses IaC to create a monitoring framework to monitor DevOps operations.

    Our analysis suggests that use of IaC is not only limited to implement automated deployment and DevOps, but also to create monitoring applications for software systems. One possible explanation can be the ability to express system configurations in a programmatic manner using IaC scripts.

  • Empirical study related to infrastructure as code (7): Publications that have conducted empirical studies related to IaC can be divided into two groups: publications focused on testing, and publications focused on non-testing issues. The three publications related to testing are S4, S5, and S11. The four publications that have conducted empirical analysis, but not are not focused on testing are S2, S8, S10, and S23. In S2, the authors observed IaC scripts churn frequently, making them susceptible to defects. Authors of S2 used 265 open source repositories to quantify the co-evolution of IaC scripts with software source code and software test code. In S8, authors studied code anti-aptterns that may cause maintainability issues for IaC script development and maintenance. Authors in S8 mined 4,621 open source Github repositories to identify code anti-patterns that can occur in IaC scripts. Authors of S10 proposed Tortoise, an automated program repair tool to fix configurations in Puppet scripts. In S23, the authors interviewed practitioners, and synthesized how practitioners from 10 companies use IaC scripts to implement continuous deployment. They observed IaC scripts to churn frequently, and are prone difficult to debug defects.

  • Testing in infrastructure as code (4): We identify four publications that addresses the topic of testing for IaC scripts. From these four publications, we observe researchers’ interest on testing the idempotence property of IaC scripts such as Chef and Puppet. In IaC it is expected that the deployed system converge into the desired state. Whether or not the deployed system has reached the desired state is called idempotence Hummer:IaC . In S6, the authors proposed a testing framework that test if Puppet scripts reach their convergence. S5 proposed a framework to test idempotence in IaC. Their approach used a state transition-based modeling approach to generate test cases to test idempotence for Chef scripts. In S4, the authors reported that the approach suggested in S5 generates too much test cases, and proposed an approach to reduce the amount of test cases to generate the test cases needed for testing of idempotence. The approach proposed in S4 combined testing and static verification approaches to generate test cases needed to test idempotence.

    Based on our analysis we observe the lack of empirical studies that focus on test coverage, test practices, and testing techniques. We advocate for research studies that can investigate other aspects of testing such as test coverage and testing practices.

4.2 Answer to RQ2: What are the temporal publication trends for infrastructure as code (IaC)-related research topics?

We answer RQ2 by first providing the count of publications that are published each year. We provide our findings in Table 5. Even though our search process included publications starting from 2000, our earliest IaC-related publication, based on publication date is the year of 2012. The highest publication count is nine for the year 2017.

Year Count
2012 1
2013 3
2014 8
2015 4
2016 6
2017 9
Table 5: Frequency of IaC-related Publications

We also analyze the frequency of publications for each topic. We present our findings in Table 6. We observe that for topics ‘Framework/Tool’, ‘Use of IaC’, and ‘Empirical Study’, publication frequency increases after 2014, which is consistent with our overall trend in which we observe IaC-related publications to increase after 2014. We cannot make similar observations for other topics, as the count of publications may not be enough to report any existing trends.

Topic 2012 2013 2014 2015 2016 2017
Framework/Tool 0 1 2 4 4 5
Use of IaC 1 0 2 4 1 3
Empirical Study 0 2 0 1 1 3
Testing 0 2 0 0 1 1
Table 6: Frequency of Publications per Year for Each Topic

4.3 Answer to RQ3: What are the temporal trends for the use of infrastructure as code (IaC)-related tools, as mentioned in IaC-related publications?

We answer RQ3 by reporting the IaC tools that are used to conduct the research reported in our collection of 31 publications. We first report the names of each IaC tool, and how many times each IaC tool was used in our set of publications in Table 7. The ‘Tool’ column presents the name of the tool, followed by a reference. The ‘Count Of Publications’ column presents the count of publications that have used a certain IaC tool. We observe that 12 IaC tools were used in 31 publications, where the highest usage occurred for Chef: authors of 8 (25.8%) IaC-related publications used Chef to conduct their research studies.

Tool Count Of Publications
ABS Modeling Language s18:abs 1
Ansible 212121https://www.ansible.com/ 1
Argon s31:end2end 2
Charon charon:s20 1
Chef 222222https://www.chef.io/chef/ 8
ConfigValidLang  s24:configvalidator 1
DevOpsLang s17:devopslang 1
Foreman 232323https://www.theforeman.org/ 1
Juju 242424https://jujucharms.com/ 3
Omnia s37:omnia 1
Puppet 252525https://puppet.com/ 6
Vagrant 262626https://www.vagrantup.com/ 1
Table 7: Usage of IaC Tools

We report the tool usage for publications included in each topic in Table 8. Each tool is reported in the ‘Tool’ column, and the count of each tool’s usage in publications for each topic is represented in each cell. For topic ‘Framework/Tool’ we observe a variety of tools to be used. In case of ‘Empirical Study’ and ‘Testing’ usage of tools are limited between Chef and Puppet. Our findings indicate that for conducting empirical studies in the area of IaC, scripts of popular tools such as Chef and Puppet may be more used than other tools such as Ansible or Juju.

Tool Framework/Tool Use of IaC Empirical Testing
ABS Modeling Language 1 1 0 0
Ansible 0 1 0 0
Argon 2 0 0 0
Charon 1 0 0 0
Chef 3 1 3 3
ConfigValidLang 1 0 0 0
DevOpsLang 0 1 0 0
Foreman 1 0 0 0
Juju 3 0 0 0
Omnia 0 1 0 0
Puppet 3 1 3 1
Vagrant 1 1 0 0
Table 8: Usage of IaC Tools Amongst Topics

We also report the usage of IaC tools for year as reported in our set of 31 publications in Table 9. For the year 2012, we do not observe any publication in our set to use an IaC tool to conduct IaC-related research. Findings from Table 9 suggest that From 2013 to 2017, use of two commercial tools Chef and Puppet, are higher than that of other IaC tools.

Tool 2012 2013 2014 2015 2016 2017
ABS Modeling Language 0 0 0 1 0 0
Ansible 0 0 0 1 0 0
Argon 0 0 0 0 0 2
Charon 0 1 0 0 0 0
Chef 0 2 2 3 0 1
ConfigValidLang 0 0 0 0 0 1
DevOpsLang 0 0 1 0 0 0
Foreman 0 0 0 0 1 0
Juju 0 0 1 2 0 0
Omnia 0 0 0 0 0 1
Puppet 0 0 0 2 3 1
Vagrant 0 0 1 0 0 0
Table 9: Usage of IaC Tools per Year as Reported in Publications

5 Discussion

In this section, we describe the implications of our systematic mapping study in the following sub-sections:

5.1 Research in IaC: State of the Art

We have identified 31 IaC-related publications from 9,840 search results. Our findings indicate that as a research area, IaC is relatively new. Such observation however, is not surprising: in the field of software engineering, IaC has very recently getting popular with the increased popularity of DevOps and continuous deployment. As use of IaC gets popular in future, both in the open source and proprietary domain, we expect to see more research studies that will investigate different avenues of research for example, anti-patterns, barriers to adopt and use IaC, and code quality.

We identify four topics with ‘Framework/Tool’ being the most prevalent topic with respect to publication count. One possible explanation can be attributed to the usage of IaC in different teams. For example, finding existing IaC tools limiting, authors of S20 introduces a new IaC tool to implement the practice of IaC. Our conjecture is that depending on the needs of IT organizations, new frameworks or tools related to IaC are being proposed in publications.

We also observe compared to other software engineering research areas, the frequency of publications related to empirical studies and testing are infrequent. We provide three possible explanations:

  • IT organizations have not adopted IaC at a wide scale and, as a result, empirical studies related to their experiences and challenges have not been reported

  • IT organizations that have adopted IaC are not open in sharing their experiences

  • Researchers do not have access to the necessary resources to perform empirical studies and other forms of research in the area of IaC

We do not observe any publication related to defects and security flaws. One possible explanation can be attributed to the lack of research resources. To conduct studies related to defects, validation, and verification for IaC researchers need access to relevant artifacts for example, scripts, bug reports, vulnerability reports etc., which may be non-trivial to obtain.

5.2 Variety of Tools

In Table 8 we have reported 12 IaC tools that are used in our set of 31 publications. The three most frequent tools used are Chef, Puppet, and Juju. All these three tools are used for commercial purposes. Our findings suggest tools that are used commercially used for practitioners, such as Chef and Puppet, can be better-suited for future IaC-related research. Open source code repositories such as Github 272727https://github.com/, PuppetForge 282828https://forge.puppet.com/ and Chef Cookbooks 292929https://supermarket.chef.io/cookbooks, can be a good source for conducting IaC research.

5.3 Potential Research Avenues in IaC

Our findings reveal that researchers are yet to explore certain avenues for IaC. We do not observe publications to study defects and security flaws. We also observe lack of empirical studies conducted in the area of anti-patterns; only one publication studied anti-patterns in IaC scripts. We highlight potential research avenues that researchers may want to explore in the future:

  • Anti-patterns: Anti-patterns are recurring practices in software engineering that can have potential negative consequences Brown:AP . In our set of 31 publications, only one publication (S8) addressed the subject of anti-pattern. However, that study is limited to code anti-patterns. Researchers can explore what other anti-patterns can exist for IaC, for example, process anti-patterns, system architecture anti-patterns, security anti-patterns, and project management anti-patterns.

  • Defect Analysis: Defects in IaC scripts can have serious consequences, for example a defect in an IaC script caused a wide-scale outage for Gihtub 303030https://github.com/blog/1759-dns-outage-post-mortem. Based on our analysis, we do not observe existing IaC-related publications to study defects. We encourage researchers to investigate which characteristics of IaC correlate with defects, and how such defects can be mitigated.

  • Security: As IaC scripts are used to configure software systems and cloud instances at scale, an error that violates security objectives nist:cia , can compromise the entire system. In our set of 31 publications, we do not any publication that focus on security issues. Researchers can systematically study which security flaws are exhibited in IaC scripts, what are the consequences of such security flaws, and provide guidelines on how such flaws can be mitigated.

  • Knowledge and Training: Similar to any new technology, users of IaC, who are new to the technology can face challenges. What are the challenges in learning and implementing IaC, could be of interest to researchers. Such challenges can also provide recommendations on how course curriculum can be designed, so that students as well as practitioners are well-prepared for fulfilling IaC-related tasks in industry.

  • Industry best practices: Based on our analysis, we do not observe any research study that systematically characterizes the best practices for IaC implementation. Such characterization can be helpful for both: IT organizations that want to implement IaC, and for IT organizations who have already started implementing IaC. Synthesis of industry best practices exist for other domains such as, DevOps rahman:csed:devsecops , security bsiim:2009 , and continuous deployment cd:adage:parnin  me:agile:cd . Similar research initiatives to characterize industry best practices may also be beneficial for IaC adopters.

5.4 Towards Better Reporting of Research Findings

As reported in Section 4, none of the publication in our set has a perfect score of 4.0, for all quality checks. We also observe the majority of the publications to have actionable findings/suggestions for practitioners and researchers, but they not pass all quality checks. While reporting future IaC-related research results, researchers can take our findings into account. We advise researchers to follow guidelines provided by experts Shaw:Good:SWE  Kitchenham:Quality  wohlin:ese , when reporting their findings related to IaC research.

6 Threat to Validity

We discuss the limitations of our systematic mapping study as following:

  • Internal Validity: We acknowledge that our search process may not be comprehensive. As described in Section 3, we have used six scholar databases. We have not considered other scholar databases such as Scopus 313131https://www.scopus.com/freelookup/form/author.uri, which may include relevant IaC publications.

    Our use of seven search strings may also not be comprehensive, as the search strings may leave out IaC-related publications during our search process. We mitigated this threat by calculating the quasi-sensitivity metric (QSM), which yielded a score of 1.0.

  • Conclusion Validity: We apply a set of inclusion criteria to select which publications are related to IaC. We acknowledge that the process of selecting these publications can be subjective, with the potential of missing IaC-related publications. We mitigate the subjectivity by using two raters who individually determined which publications are related to IaC.

    We apply qualitative analysis to determine the topics that are being discussed in IaC-related publications. We determine these topics by extracting qualitative codes and following the guidelines of qualitative analysis Stol:2016:GT:ICSE . We acknowledge the process of generating topics can be subjective. We mitigate this limitation by using two qualitative raters.

  • External Validity: Our analysis is dependent on our set of 31 publications collected on December 30, 2017. Furthermore, we have used certain scholar databases, which may not include all relevant publications for our paper. Due to the above-mentioned issues, generalizability of our findings can be limiting. We mitigate this threat by using six scholar databases recommended by Kurhamm et al. emse:slr:guide

7 Conclusion

IaC is a fundamental practice to implement continuous deployment. As adoption of DevOps amongst IT organizations gets increasingly popular, IaC can be an important research topic in the field of software engineering. A systematic mapping study can characterize existing research studies in the field of IaC and identify the open research areas in IaC. The goal of such study would be to help researchers in identifying potential research areas related to IaC.

We accomplish this goal by conducting a systematic mapping study in the field of IaC. Using six scholar databases, we collect 31 publications related to IaC, which are systematically filtered from 33,887 publications. We generate four topics by performing qualitative analysis on the collected publications. These four topics are: (i) framework/tool for infrastructure as code; (ii) use of infrastructure as code; (iii) empirical study related to infrastructure as code; and (iv) testing in infrastructure as code. We observe the ‘Framework/Tool for infrastructure as code’ to be the most prevalent topic, followed by ‘Use of infrastructure as code’. Our findings suggest that current research in IaC has mostly focused on implementing or extending the practice of IaC. We also observe 12 tools that are used in our set of 31 publications. The most frequently used tool is Chef, followed by Puppet.

As defects and security flaws in IaC scripts can cause serious consequences, we advocate for research studies that addresses code quality issues such as defects and security flaws, along with exploring other research avenues. With respect to reporting research results, we advise researchers to follow the guidelines on writing good publications Shaw:Good:SWE  Kitchenham:Quality , so that the expected quality checks of research studies are fulfilled. We hope our systematic mapping study will facilitate further research in the area of IaC.

See pages - of appendix.pdf

8 References


  • (1) J. Humble, D. Farley, Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation, 1st Edition, Addison-Wesley Professional, 2010.
  • (2) Puppet, Ambit energy’s competitive advantage? it’s really a devops software company, Tech. rep., Puppet (April 2018).
    URL https://puppet.com/resources/case-study/ambit-energy
  • (3) C. Parnin, E. Helms, C. Atlee, H. Boughton, M. Ghattas, A. Glover, J. Holman, J. Micco, B. Murphy, T. Savor, M. Stumm, S. Whitaker, L. Williams, The top 10 adages in continuous deployment, IEEE Software 34 (3) (2017) 86–95. doi:10.1109/MS.2017.86.
  • (4) P. Labs, Puppet Documentation, https://docs.puppet.com/, [Online; accessed 10-October-2017] (2017).
  • (5) Y. Jiang, B. Adams, Co-evolution of infrastructure and source code: An empirical study, in: Proceedings of the 12th Working Conference on Mining Software Repositories, MSR ’15, IEEE Press, Piscataway, NJ, USA, 2015, pp. 45–55.
    URL http://dl.acm.org/citation.cfm?id=2820518.2820527
  • (6) T. Sharma, M. Fragkoulis, D. Spinellis, Does your configuration code smell?, in: Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, ACM, New York, NY, USA, 2016, pp. 189–200. doi:10.1145/2901739.2901761.
    URL http://doi.acm.org/10.1145/2901739.2901761
  • (7) K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson, Systematic mapping studies in software engineering, in: Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, EASE’08, BCS Learning & Development Ltd., Swindon, UK, 2008, pp. 68–77.
    URL http://dl.acm.org/citation.cfm?id=2227115.2227123
  • (8) B. A. Kitchenham, D. Budgen, O. P. Brereton, Using mapping studies as the basis for further research – a participant-observer case study, Information and Software Technology 53 (6) (2011) 638 – 651, special Section: Best papers from the APSEC. doi:https://doi.org/10.1016/j.infsof.2010.12.011.
    URL http://www.sciencedirect.com/science/article/pii/S0950584910002272
  • (9) Z. Li, P. Avgeriou, P. Liang, A systematic mapping study on technical debt and its management, J. Syst. Softw. 101 (C) (2015) 193–220. doi:10.1016/j.jss.2014.12.027.
    URL http://dx.doi.org/10.1016/j.jss.2014.12.027
  • (10) V. G. Yusifoğlu, Y. Amannejad, A. B. Can, Software test-code engineering: A systematic mapping, Information and Software Technology 58 (2015) 123 – 147. doi:https://doi.org/10.1016/j.infsof.2014.06.009.
    URL http://www.sciencedirect.com/science/article/pii/S0950584914001487
  • (11) F. Elberzhager, A. Rosbach, J. MüNch, R. Eschbach, Reducing test effort: A systematic mapping study on existing approaches, Inf. Softw. Technol. 54 (10) (2012) 1092–1106. doi:10.1016/j.infsof.2012.04.007.
    URL http://dx.doi.org/10.1016/j.infsof.2012.04.007
  • (12) A. Seriai, O. Benomar, B. Cerat, H. Sahraoui, Validation of software visualization tools: A systematic mapping study, in: 2014 Second IEEE Working Conference on Software Visualization, 2014, pp. 60–69. doi:10.1109/VISSOFT.2014.19.
  • (13) B. Kitchenham, S. Charters, Guidelines for performing systematic literature reviews in software engineering, Tech. Rep. EBSE 2007-001, Keele University and Durham University Joint Report (2007).
    URL http://www.dur.ac.uk/ebse/resources/Systematic-reviews-5-8.pdf
  • (14) B. Kitchenham, D. I. K. Sjoberg, T. Dyba, P. Brereton, D. Budgen, M. Host, P. Runeson, Trends in the quality of human-centric software engineering experiments–a quasi-experiment, IEEE Trans. Softw. Eng. 39 (7) (2013) 1002–1017. doi:10.1109/TSE.2012.76.
    URL http://dx.doi.org/10.1109/TSE.2012.76
  • (15) J. Saldana, The coding manual for qualitative researchers, 1st Edition, Sage, London, UK, 2015.
  • (16) R. Shambaugh, A. Weiss, A. Guha, Rehearsal: A configuration verification tool for puppet, SIGPLAN Not. 51 (6) (2016) 416–430. doi:10.1145/2980983.2908083.
    URL http://doi.acm.org/10.1145/2980983.2908083
  • (17) F. Q. B. da Silva, A. L. M. Santos, S. C. B. Soares, A. C. C. França, C. V. F. Monteiro, A critical appraisal of systematic reviews in software engineering from the perspective of the research questions asked in the reviews, in: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’10, ACM, New York, NY, USA, 2010, pp. 33:1–33:4. doi:10.1145/1852786.1852830.
    URL http://doi.acm.org/10.1145/1852786.1852830
  • (18) O. Hanappi, W. Hummer, S. Dustdar, Asserting reliable convergence for configuration management scripts, SIGPLAN Not. 51 (10) (2016) 328–343. doi:10.1145/3022671.2984000.
    URL http://doi.acm.org/10.1145/3022671.2984000
  • (19) K. Ikeshita, F. Ishikawa, S. Honiden, Test suite reduction in idempotence testing of infrastructure as code, in: S. Gabmeyer, E. B. Johnsen (Eds.), Tests and Proofs, Springer International Publishing, Cham, 2017, pp. 98–115.
  • (20) A. Weiss, A. Guha, Y. Brun, Tortoise: Interactive system configuration repair, in: Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, IEEE Press, Piscataway, NJ, USA, 2017, pp. 625–636.
    URL http://dl.acm.org/citation.cfm?id=3155562.3155641
  • (21) W. Hummer, F. Rosenberg, F. Oliveira, T. Eilam, Automated testing of chef automation scripts, in: Proceedings Demo:38; Poster Track of ACM/IFIP/USENIX International Middleware Conference, MiddlewareDPT ’13, ACM, New York, NY, USA, 2013, pp. 4:1–4:2. doi:10.1145/2541614.2541632.
    URL http://doi.acm.org/10.1145/2541614.2541632
  • (22) T. Kosar, S. Bohra, M. Mernik, Domain-specific languages: A systematic mapping study, Information and Software Technology 71 (2016) 77 – 91. doi:https://doi.org/10.1016/j.infsof.2015.11.001.
    URL http://www.sciencedirect.com/science/article/pii/S0950584915001858
  • (23) R. L. Novais, A. Torres, T. S. Mendes, M. Mendonça, N. Zazworka, Software evolution visualization: A systematic mapping study, Inf. Softw. Technol. 55 (11) (2013) 1860–1883. doi:10.1016/j.infsof.2013.05.008.
    URL http://dx.doi.org/10.1016/j.infsof.2013.05.008
  • (24) S. Jalali, C. Wohlin, Agile practices in global software engineering - a systematic map, in: Proceedings of the 2010 5th IEEE International Conference on Global Software Engineering, ICGSE ’10, IEEE Computer Society, Washington, DC, USA, 2010, pp. 45–54. doi:10.1109/ICGSE.2010.14.
    URL http://dx.doi.org/10.1109/ICGSE.2010.14
  • (25) B. Kitchenham, What’s up with software metrics? - a preliminary mapping study, J. Syst. Softw. 83 (1) (2010) 37–51. doi:10.1016/j.jss.2009.06.041.
    URL http://dx.doi.org/10.1016/j.jss.2009.06.041
  • (26) N. Condori-Fernandez, M. Daneva, K. Sikkel, R. Wieringa, O. Dieste, O. Pastor, A systematic mapping study on empirical evaluation of software requirements specifications techniques, in: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, ESEM ’09, IEEE Computer Society, Washington, DC, USA, 2009, pp. 502–505. doi:10.1109/ESEM.2009.5314232.
    URL http://dx.doi.org/10.1109/ESEM.2009.5314232
  • (27) E. Engstrom, P. Runeson, Software product line testing - a systematic mapping study, Inf. Softw. Technol. 53 (1) (2011) 2–13. doi:10.1016/j.infsof.2010.05.011.
    URL http://dx.doi.org/10.1016/j.infsof.2010.05.011
  • (28) N. Paternoster, C. Giardino, M. Unterkalmsteiner, T. Gorschek, P. Abrahamsson, Software development in startup companies: A systematic mapping study, Inf. Softw. Technol. 56 (10) (2014) 1200–1218. doi:10.1016/j.infsof.2014.04.014.
    URL http://dx.doi.org/10.1016/j.infsof.2014.04.014
  • (29) M. Riaz, T. Breaux, L. Williams, How have we evaluated software pattern application? a systematic mapping study of research design practices, Information and Software Technology 65 (2015) 14 – 38. doi:https://doi.org/10.1016/j.infsof.2015.04.002.
    URL http://www.sciencedirect.com/science/article/pii/S0950584915000774
  • (30) M. Kuhrmann, D. M. Fernández, M. Daneva, On the pragmatic design of literature studies in software engineering: an experience-based guideline, Empirical Software Engineering 22 (6) (2017) 2852–2891. doi:10.1007/s10664-016-9492-y.
    URL https://doi.org/10.1007/s10664-016-9492-y
  • (31) H. Zhang, M. A. Babar, P. Tell, Identifying relevant studies in software engineering, Inf. Softw. Technol. 53 (6) (2011) 625–637. doi:10.1016/j.infsof.2010.12.010.
    URL http://dx.doi.org/10.1016/j.infsof.2010.12.010
  • (32) S. Krum, W. V. Hevelingen, B. Kero, J. Turnbull, J. McCune, Pro Puppet, 2nd Edition, Apress, Berkely, CA, USA, 2013.
  • (33) K. Morris, Infrastructure as code: managing servers in the cloud, ” O’Reilly Media, Inc.”, 2016.
  • (34) M. Httermann, DevOps for Developers, 1st Edition, Apress, Berkely, CA, USA, 2012.
  • (35) J. Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement 20 (1) (1960) 37–46. arXiv:http://dx.doi.org/10.1177/001316446002000104, doi:10.1177/001316446002000104.
    URL http://dx.doi.org/10.1177/001316446002000104
  • (36) M. Burcham, M. Al-Zyoud, J. C. Carver, M. Alsaleh, H. Du, F. Gilani, J. Jiang, A. Rahman, O. Kafalı, E. Al-Shaer, L. Williams, Characterizing scientific reporting in security literature: An analysis of acm ccs and ieee s&p papers, in: Proceedings of the Hot Topics in Science of Security: Symposium and Bootcamp, HoTSoS, ACM, New York, NY, USA, 2017, pp. 13–23. doi:10.1145/3055305.3055307.
    URL http://doi.acm.org/10.1145/3055305.3055307
  • (37) C. Wohlin, P. Runeson, M. Hst, M. C. Ohlsson, B. Regnell, A. Wessln, Experimentation in Software Engineering, Springer Publishing Company, Incorporated, 2012.
  • (38) Q. Yang, J. J. Li, D. M. Weiss, A survey of coverage-based testing tools, The Computer Journal 52 (5) (2009) 589–597. doi:10.1093/comjnl/bxm021.
  • (39) J. Portillo-Rodríguez, A. Vizcaíno, M. Piattini, S. Beecham, Tools used in global software engineering: A systematic mapping review, Information and Software Technology 54 (7) (2012) 663 – 685. doi:https://doi.org/10.1016/j.infsof.2012.02.006.
    URL http://www.sciencedirect.com/science/article/pii/S0950584912000493
  • (40) J. R. Landis, G. G. Koch, The measurement of observer agreement for categorical data, Biometrics 33 (1) (1977) 159–174.
    URL http://www.jstor.org/stable/2529310
  • (41) B. A. Kitchenham, S. L. Pfleeger, L. M. Pickard, P. W. Jones, D. C. Hoaglin, K. E. Emam, J. Rosenberg, Preliminary guidelines for empirical research in software engineering, IEEE Trans. Softw. Eng. 28 (8) (2002) 721–734. doi:10.1109/TSE.2002.1027796.
    URL http://dx.doi.org/10.1109/TSE.2002.1027796
  • (42) S. de Gouw, M. Lienhardt, J. Mauro, B. Nobakht, G. Zavattaro, On the integration of automatic deployment into the abs modeling language, in: S. Dustdar, F. Leymann, M. Villari (Eds.), Service Oriented and Cloud Computing, Springer International Publishing, Cham, 2015, pp. 49–64.
  • (43) J. Sandobalin, E. Insfran, S. Abrahao, End-to-end automation in cloud infrastructure provisioning, in: Proceedings of Information Systems Development: Advances in Methods, Tools and Management, ISD ’17, 2017.
  • (44) E. Dolstra, R. Vermaas, S. Levy, Charon: Declarative provisioning and deployment, in: Proceedings of the 1st International Workshop on Release Engineering, RELENG ’13, IEEE Press, Piscataway, NJ, USA, 2013, pp. 17–20.
    URL http://dl.acm.org/citation.cfm?id=2663360.2663365
  • (45) S. Baset, S. Suneja, N. Bila, O. Tuncer, C. Isci, Usable declarative configuration specification and validation for applications, systems, and cloud, in: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference: Industrial Track, Middleware ’17, ACM, New York, NY, USA, 2017, pp. 29–35. doi:10.1145/3154448.3154453.
    URL http://doi.acm.org/10.1145/3154448.3154453
  • (46) J. Wettinger, U. Breitenbücher, F. Leymann, Devopslang - bridging the gap between development and operations, in: Proceedings of the 3rd European Conference on Service-Oriented and Cloud Computing (ESOCC 2014), Lecture Notes in Computer Science (LNCS), Springer-Verlag, 2014, pp. 108–122.
  • (47) M. Miglierina, D. A. Tamburri, Towards omnia: A monitoring factory for quality-aware devops, in: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion, ICPE ’17 Companion, ACM, New York, NY, USA, 2017, pp. 145–150. doi:10.1145/3053600.3053629.
    URL http://doi.acm.org/10.1145/3053600.3053629
  • (48) W. H. Brown, R. C. Malveau, H. W. S. McCormick, T. J. Mowbray, AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis, 1st Edition, John Wiley & Sons, Inc., New York, NY, USA, 1998.
  • (49) R. Kissel, Glossary of key information security terms, Diane Publishing, 2011.
  • (50) A. Rahman, L. Williams, Software security in devops: Synthesizing practitioners’ perceptions and practices, in: 2016 IEEE/ACM International Workshop on Continuous Software Evolution and Delivery (CSED), 2016, pp. 70–76. doi:10.1109/CSED.2016.021.
  • (51) G. McGraw, B. Chess, S. Migues, Building security in maturity model, Fortify & Cigital.
  • (52) A. A. U. Rahman, E. Helms, L. Williams, C. Parnin, Synthesizing continuous deployment practices used in software development, in: 2015 Agile Conference, 2015, pp. 1–10. doi:10.1109/Agile.2015.12.
  • (53) M. Shaw, Writing good software engineering research papers: Minitutorial, in: Proceedings of the 25th International Conference on Software Engineering, ICSE ’03, IEEE Computer Society, Washington, DC, USA, 2003, pp. 726–736.
    URL http://dl.acm.org/citation.cfm?id=776816.776925
  • (54) K.-J. Stol, P. Ralph, B. Fitzgerald, Grounded theory in software engineering research: A critical review and guidelines, in: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, ACM, New York, NY, USA, 2016, pp. 120–131. doi:10.1145/2884781.2884833.
    URL http://doi.acm.org/10.1145/2884781.2884833