Agile Risk Management for Multi-Cloud Software Development

01/10/2020 ∙ by Victor Muntes-Mulero, et al. ∙ CA Technologies 0

Industry in all sectors is experiencing a profound digital transformation that puts software at the core of their businesses. In order to react to continuously changing user requirements and dynamic markets, companies need to build robust workflows that allow them to increase their agility in order to remain competitive. This increasingly rapid transformation, especially in domains like IoT or Cloud computing, poses significant challenges to guarantee high quality software, since dynamism and agile short-term planning reduce the ability to detect and manage risks. In this paper, we describe the main challenges related to managing risk in agile software development, building on the experience of more than 20 agile coaches operating continuously for 15 years with hundreds of teams in industries in all sectors. We also propose a framework to manage risks that considers those challenges and supports collaboration, agility, and continuous development. An implementation of that framework is then described in a tool that handles risks and mitigation actions associated with the development of multi-cloud applications. The methodology and the tool have been validated by a team of evaluators that were asked to consider its use in developing an urban smart mobility service and an airline flight scheduling system.



There are no comments yet.


page 1

page 4

page 5

page 6

page 7

page 8

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Organizations in all industries recognize that they are continuing their evolution into technology and data companies McKendrick , and that their business models are being partially or fully transformed by software. One of the main drivers of this transformation has been the adoption of paradigms such as IoT or Cloud computing. Cloud computing allows companies to create new business models that scale depending on the demand, to migrate their operations to the Cloud, and to create new service models such as Software as a Service (SaaS) mell . Large technology companies acknowledge the importance of SaaS and Cloud computing111, and users also express their preference for a subscription-based model since customers become the focus of the companies instead of the product or the transaction222

Gartner estimates that by 2022 the main barrier for IoT adoption will still be security 

Gartner . Northbridge indicated in their latest Future of Cloud Computing Survey Northbridge that security was still the top concern and inhibitor of cloud adoption. Furthermore, Article 25 in the recent GDPR gdpr discusses data protection by design and by default, underlining that considering privacy from the beginning is essential to address privacy successfully. Additionally there is an increased need to create trustworthy systems. Just as an example, Yan et al. yan2014survey published a survey where they describe the different dimensions of trust for Internet of Things systems. Their work concludes that risk management is an essential piece to guarantee trustworthiness.

Privacy or security-by-design can only be achieved if risk management is performed from the beginning of the software development cycle. Continuous and agile risk management processes are of great importance in IoT and Cloud computing since these environments have a complex and distributed network of services that increases the attack surface. In the context of multi-cloud applications, whose components are deployed over different infrastructures provided by different Cloud Service Providers (CSPs), risk related to accountability, assurance, agility or even financial aspects become even more challenging. Risks analysis can also guide the selection of CSPs, and several authors have proposed methodologies to consider risks in addition to quality of service or cost Omerovic , especially in the context of multi-cloud applications gupta15 . Therefore, any component, whether running on premises, remotely hosted by a cloud service provider or offered as a service (by a CSP, a device in an IoT, etc.), is subject to risk analysis considerations. Poor risk management allied to a reactive strategy usually forces companies to continuously re-factor their application architectures to improve overall software quality and security, incurring in technical debt and high re-implementation costs Boehm03 .

The demands of software driven businesses and the need for fast innovation are forcing organisations to replace traditional software development methods such as Waterfall for alternatives that must be agile and support continuous development and delivery models in an aggressively changing market. The principles that drive DevOps and Agile methodologies, with a focus on transparency and collaboration between teams, can help minimise the risks in the application and the inconsistencies in the way risks are managed. Nevertheless, the Agile Manifesto ( suggests that Agile teams focus on delivering the simplest code that supports the needs of their customers. In their attempt to adhere to this bottom-up principle, Agile teams take an incremental and iterative approach that focuses on delivering individual software capabilities, usually derived from functional requirements (FRs), diminishing the relevance of system-wide quality attributes and other aspects related to non-functional requirements (NFRs), which in general are not well managed in agile methodologies Paetsch03 ; Medeiros17 . This negatively affects code quality and the quality of the overall software architecture. As a consequence, elicitation and management of risks connected to NFRs in Agile become increasingly important to mitigate the negative effect of relegating NFRs to second-class citizens.

To our knowledge, only some initial work has been done to conceptualize agile risk management techniques, and effective control of NFR-related risks in agile methodologies is still to be discussed. This paper aims at describing challenges that deserve more attention and then proposing and validating a solution. The paper is organized as follows. Section 2 analyses how non-functional requirements are currently managed, and the main obstacles related specifically to risk management. This analysis will help us describe in Section 3 unsolved problems that pose challenges in agile risk management. As a solution to these challenges, in Section 4 we propose a framework to support a collaborative and agile risk analysis in environments that require continuous software delivery, in order to detect risks and generate sets of mitigation actions. Section 5 describes how this framework and associated tool set have been implemented for the development of multi-cloud applications. This tool set was validated by a team of evaluators whose feedback is analyzed in Section 6. Finally, Section 7 draws some conclusions and outlines future lines of work.

2 Related Work

This Section discusses relevant work related to risk management methods for software development, non-functional requirements management in Agile methodologies, continuous software development, and risk management in a distributed agile development.

Traditional risk management for software development

Implementing risk management processes is essential, specially for complex, high-risk projects Wallmuller:2002:RMS:643566.643578 . The business domain can also affect the need for risk analysis. In this sense, domains that are less volatile such as supply chain software will not change so quickly, while other businesses will be continuously looking at their customers and reacting to changes, modifying the requirements accordingly.

In general, previous work focuses on the classical schedule, budget and scope risk analysis. When using risk management techniques for software projects managed through waterfall methodologies, risks are thoroughly analyzed at the beginning of the project but when requirements change during the project life-time risk analysis may become obsolete rao2016study . Waterfall models assume requirements to be clearly defined in advance in the design stage and, in general, to remain fairly immutable, which tends to be unrealistic for projects in a rapidly changing market. Although there are change management mechanisms in waterfall processes that permit for requirements to change, risk management is seldom involved, and the overall risk analysis is not updated.

There are a number of risk assessment methodologies that include quantitative metrics, such as the probability of occurrence or the effort to implement control measures, or even its cost. But others consider qualitative metrics, e.g. an appraisal of project staff’s motivation. There are many quantitative risk methodologies and tools, like RiskWatch

333RiskWatch: . Accessed: 2017-05-09 or ISRAM Karabacak:2005:IIS:2625876.2626120 and there are many qualitative risk methodologies such as OCTAVE Alberts02 , Coras lund2010model

or one of the most commonly known in the hosted software arena, STRIDE

STRIDE . OWASP OWASP:2013 is an open standard trying to define the risk aware software development. It aims at making software security visible, so that organisations are able to make informed decisions. DREAD Howard06 is a successor of STRIDE and provides another approach specialised in multi-stack type of applications. Pasha et al. Pasha performed recently a detailed and thorough analysis of different risk management approaches, both for small and large-scale software systems. In addition to the state-of-the-art, the authors also proposed a methodology for risk mitigation that was very complete but lacks agility.

Agile risk management for software development

In general, it is accepted that managing risks and other non-functional requirements is ill-defined in agile approaches Ramos18 . Different lightweight NFRs methodologies for agile processes have been presented such as  Domah15 or Farid12 , where authors combined both FRs and NFRs in one framework and computed a risk-based requirements implementation sequence. Recently, Medeiros et al. Medeiros17 propose an approach to specifying requirements based on design practices targeted to the developer. It is worth highlighting the work by Moran Moran2014 ; Moran2014b , where the author explicitly tackles issues related to risk management for agile software development. Moran proposes a risk modified kanban board and user story map. To our knowledge, this is the only research work that could be framed in the research challenges described in this paper. Some products like codeBeamer444Risk Management in codeBeamer:, which provide scaled agile capabilities, also include a mechanism to assign risks to both functional and non-functional requirements. However, their methods to define and manage risks are very similar to traditional risk management methods.

Most of the traditional risk management methodologies described above focus on the assessment of risks at a singular stage in time. However, these techniques do not address one of the most prominent challenges in today’s adaptive, pivot-oriented world: the challenge of the continuously evolving risk profile. Managing continuous changes in software development has been studied from different perspectives. Practices such as “release early, release often”

have been promoted and adopted in open source software development

feller05 and they prove to benefit software quality and consistency Michlmayr15 . While continuous software integration Stahl14 may be the most well-known practice in industry related to continuous software development, the increasing focus on security and privacy has also generated work on continuous security and regulatory compliance. Fitzgerald et al. FitzgeraldS17 publishes a roadmap and agenda for continuous software engineering. Ameller et al. ameller17 propose to re-plan the current release every time an activity triggers the need for an updated release plan.

Initially agile methods were devised to fit small projects with co-located developers in contexts where safety and security were not critical Ambler01 . However, the use of these methods have been extended to large projects with distributed development teams Fitzgerald06 and safety-critical systems fitzgerald13 . In this last paper, Fitzgerald et al. discuss R-Scrum (Regulated Scrum), an adaption of Scrum to support compliance in regulated environments, such as medical devices, railway, or aviation. They propose adding new ceremonies, artifacts and roles to allow compliance to be assessed at the end of each sprint.

Distributed agile development (DAD) approach is increasingly adopted by more and more software companies. The idea of DAD is to combine the quality and speed benefits of agile with the cost benefits of distributed software development (DSD). This combination generates significant risks, considering the contradicting nature of agile and DSD Mudumba10 . Shrivastava et al. Shrivastava2017 present a risk management framework for DAD, studying risks related to software development life cycle, project management, group awareness, etc. However, this work is focused on analysing risk factors that represent a threat to the successful completion of a software development project, rather than risk threatening non-functional requirements. In Aslam2017 , the authors performed a systematic literature review on risks and control mechanisms for DSD, which could be extrapolated for a system to support DAD.

Finally, it is worth noticing that, in general, security tends to be considered lower in priority, sometimes unintentionally, when considered as one of the non-functional requirements of an application. Continuous security Merkow11 aims at putting security as a high-priority concept through all phases of the software development process. Authors detect nine building blocks for continuous application software security, including for example employee training and awareness, creating a security software group or giving non-functional requirements the same importance as functional requirements. As an example, they suggest creating user stories to capture non-functional requirements. However, this does not provide a complete solution to challenges in Section 3 since, for example, collaboration is not expressly addressed.

3 Challenges in risk management for agile software development

In this section we present the main challenges for agile risk management related to software development. Before describing the challenges, it is worth discussing how risks are managed in Agile. The first wave of Agile adoption in many companies involved structuring software development through Scrum teams. When it comes to application risk management, most companies cannot afford to have risk management experts in each team. A frequently used solution is to choose to have staff responsible for detecting the most important risks in a centralized way. The use of framework methodologies to scale Agile, such as SAFe555Scaled Agile Framework:, marginally improves this situation. In this case, organisations structure software development by creating teams of teams that plan and synchronise their work through Program Increments (PI). When planning a new PI (typically every 10 weeks), all the actors participating in the software development process attend the meeting. PI planning usually starts with a vision and a prioritized list of new features of the product, which are then decomposed into user stories. This is defined by product owners and created by the different Agile teams, which are in charge of drafting plans but also of analyzing risks and impediments. In fact, user stories replace traditional functional requirements, and describe intended system behavior. Consequently, risk analysis is done related to expected features and to these user stories. Because of this, non-functional requirements, which are usually not represented through these user stories, tend to be unintentionally diminished in terms of importance Merkow11

and commonly ignored during the risk analysis. During PI planning risks are identified by each team, and then aggregated into a program risk sheet that is reviewed in a plenary session. The group then discusses and categorizes program risks and impediments. After PI planning, once risks have been classified, and actions and owners are established, all teams are assumed to be aware of these potential risks and impediments detected during the meeting and they are expected to act accordingly throughout the sprints in that PI. Additionally, the program leadership team will typically track those risks that are not resolved to ensure the right coordination occurs. Unfortunately, mapping between risks and user stories do not usually happen and, since PIs are guided by customer requirements, the risks detected remain as a relegated part of the development process. Risk management in an Agile environment must be integrated into the process, present in PI planning, in Sprint planning, and in any other ceremony. It must be a light but continuous activity.

The challenges included in this section are derived from the analysis of the related work and they are also based on CA Technologies Rally active coaching. Our conclusions build on top of more than 20 agile coaches operating continuously for 15 years with ever-increasing scale of engagement. Authors of this paper accumulated more than 10 years of experience coaching companies to help them adopt Agile, between 2008 and 2017. We have directly interacted with more than 1,000 individual contributors. This includes a total reach of more than 8,000 people on hundreds of development teams spanning across six continents. Our conclusions are based on face-to-face coaching work in at least 9 countries and on the interaction with companies spanning all kinds of industries, from video game to health care to aeronautics to government, to give some examples. As a result of these analysis, several challenges have been identified that need to be tackled to integrate risk analysis, and in particular risk analysis related to NFRs, in agile software development.

C1. Traditional risk analysis practices for software development do not easily translate to Agile. Traditionally, risk analysis was performed "at design time" in the waterfall method, where design, development and operations are sequential and discrete worlds. These approaches usually rely on a risk management expert and it becomes impractical for small and medium enterprises in terms of cost and finding the proper resources. It also becomes impractical for companies working towards the adoption of more agile software development methodologies. There exist some proposals for agile risk management Moran2014 , but they do not take into account existing risk and threat analysis techniques such as STRIDE STRIDE or DREAD Howard06 , or other approaches for managing security risks like OCTAVE Alberts02 , which are more focused on NFRs.

C2. Analysis of risks should be continuous. System and architectures evolve continuously. In Agile, risk management is often neglected as part of a backlog or sprint plan, which leads to a view that it is not important until the later stages of a development project. In SAFe, PI planning sessions are useful to explore new risks that are foreseeable at the time of planning. However, a risk analysis every 3 months may not be sufficient when risks are detected by any actor at any time. This exposes the general problem related to agile methodologies failing to effectively address risks and other non-functional requirements in a structured manner. There is a need for risk analysis methodologies that are adapted to agile contexts but still achieve the level of analysis and detail provided by traditional risk assessment and mitigation techniques, in particular related to NFRs. Fitzgerald et al. FitzgeraldS17 illustrate how Lean Thinking can be applied to continuous software engineering. However, they do not explicitly tackle challenges related to continuous risk management. Research into enabling continuous risk management can provide mechanisms and tools that resolve a challenge of implementing continuous analysis of risks. It is clear that a balance has to be struck between managing risk during the development process and not overloading or slowing the momentum of an agile methodology.

C3. Teams do not have sufficient expertise on risk analysis. Agile teams are usually cross-functional, optimised for communication and delivery of value. Although this could facilitate the work of specialised staff across several teams to analyse risks, allocating resources with expertise on risk assessment in each team becomes impractical. It is not possible to have a risk analysis expert in each agile team. This encourages some level of centralisation, but at the same time it requires transparency and for agile/Scrum team members, a higher capacity to participate and contribute in risk analysis and propose mitigation strategies when necessary. Consequently, lack of expertise is an important issue for agile self-managed teams, but also for small or medium enterprises that cannot afford risk experts in each team. Detecting the most prominent risks and deciding the best mitigation actions may be a difficult task. Similarly, deciding the level of likelihood and impact of a particular risk may also be very subjective and, therefore, difficult to assess and measure. In addition, domain specific risk analysis further aggravates the situation as the level of expertise becomes more demanding. While training is an essential aspect Merkow11 , lack of expertise may always be present and needs to be mitigated.

C4. Tools to manage risk in Agile do not foster collaboration. In order to improve the control of risk in Agile, tools that allow better transparency and enable collaboration of all stakeholders involved in the process are crucial. In this sense, creating new tools that can be easily embedded with other common agile tools to manage software development is very important. Current tools to analyse and manage risks are quite limited. These are usually implemented through excel spreadsheets that are shared during PI planning, but do not allow for further collaboration beyond this face-to-face meeting. In general, there is a general lack of collaborative tools to engage all the stakeholders potentially involved in analysing risks in a transparent way, during PI planning and afterwards. Common generic tools such as Kanban style boards are increasingly used by software industry for scheduling work, representing user stories, features, etc. In this sense, some works suggest the use of risk modified kanbans Moran2014 ; Moran2014b . The detection of new risks or planning mitigation actions should immediately propagate information and even trigger warning or actions through these software development management tools. There is an added challenge to handling risk in a collaborative way. Collective inter-team code ownership makes it difficult to control risks related to a particular component. In Agile, multiple teams frequently modify the code associated with a single component. Consequently, collective ownership makes it more difficult to control potential risks related to a particular component. User stories will impact several components and many of them will typically need to modify the same shared component. Efficient tools do not exist to facilitate inter-team risk analysis.

Beyond these four challenges, it is also worth noting that there is also an intrinsic obstacle to improve risk management related to the organization culture. Cultural changes of any type require time and care to be successful. The challenge in this case is to develop a cultural change methodology that places risk management as a central and critical part of an agile methodology. Cultural change of this type is often top down, implemented by management mandating risk management as an integral part of every sprint. This has the potential to lead to varying levels of adoption by the development staff.

4 A New Framework for Agile Risk Management

In this section, we propose a new agile risk analysis framework to facilitate the creation of tools for agile risk management. This framework addresses the four challenges (C1-C4) described in Section 3. It is a framework that facilitates translating traditional risk analysis practices for software development to agile software development contexts, allowing for continuous risk analysis and permitting the main stakeholders, from the agile team members to any other level in SAFe, to collaborate together.

This framework facilitates the control of risk associated to the assets of a system. The materialization of an asset in each system may depend on the type of system and the main focus of the risk analysis. For instance, components in an application architecture of a software system being developed may be the assets to be analyzed. However, in a different situation, risks to be analyzed may be associated with user stories or product features.

We conceive risk management as a continuous activity where risks may be subject to consideration and evaluation in the different stages of the software development life-cycle (C2). Further, the analysis assessment process is devised in a simple and visual way that simplifies the collaboration in a multi-disciplinary team of application development stakeholders (C4), frequently referred to as the DevOps Team. We consider recommendations, constraints, and rules to guide and support the whole team and minimise the impact of having self-managed teams without adequate risk management capabilities (C3). Lastly, to address the agility challenge (C1) but also to foster continuity and collaboration, the Kanban process board philosophy is adopted to produce a risk assessment process board.

Our framework uses a pull system in the style of Kanban, where the status of each asset with respect to a predefined risk analysis methodology is expressed through the different columns in the Kanban board. This makes our framework agnostic to any specific risk analysis methodology. A tool compliant with this framework links a Kanban-style representation with a particular risk analysis methodology, mapping each of the steps of that particular methodology to one of the columns in the Kanban. This naturally provides a solution to most of our challenges (C1, C2 and C4), as it provides a mechanism to include traditional risk analysis methodologies into an agile-ready tool such as the Kanban. Kanban style boards are used in continuous deployment environments, fostering collaboration among team members and among teams. It provides an intuitive visual representation of the current status of your system with respect to risks/threats and the implemented mitigation strategies. It also allows introducing new risks and threats and re-evaluating them in relation to each system asset, aligning with continuous software delivery approaches.

4.1 Input data assumptions

The framework presented in this paper considers that risk assessment will rely on three essential sources of information:

  • The definition of the system/application’s assets. For example, these assets can be components of the application architecture, components in the infrastructure of a physical system, or user stories or features of a particular software product.

  • The set of constraints or rules that the team has selected. As examples, these rules may be used to prevent moving an asset until some restriction has been met, to decide when warnings or specific messages should be shown, or to indicate that an approval from some member of the team must be obtained before performing an action. These rules will enforce the validity of the risk analysis and guide non-experienced users.

  • The knowledge database. This database contains information about types of assets, risks, and mitigation actions, as well as their relationships. This information may be used to automate some processes and provide recommendations that will guide the team through the different steps of the selected risk analysis methodology. This knowledge database may be adapted to the domain of the application to be developed and extended by the users.

4.2 Mapping of existing risk analysis methodologies

Our framework proposal assumes that methodologies for risk analysis can be divided into steps that can be implemented sequentially. Let us define as the set of steps of a given methodology. Our framework involves defining a set of columns in the Kanban-like board with columns (). With these two sets we can create a mapping so that each column of the Kanban board refers to each step in the methodology. For example, OWASP OWASP:2013 proposes 5 steps for rating risks: identify risks, estimate likelihood, estimate impact, determine severity of risk, and decide what to fix. We would then have 5 different columns in our Kanban.

Figure 1: Kanban-like board with columns that stores in a storage system a “Collection of Elements ” generated in each Kanban column .

Figure 1 depicts a generic representation of the Kanban approach proposed in this framework, where each Kanban column generates a collection of elements. These collections of elements are stored in a storage system. The actual content and semantics of these data will depend on the selected risk analysis methodology. For instance, a column may store collections of vulnerabilities, threats, risks or mitigation actions, to give some examples.

4.3 Agile Risk Analysis Automation

As we mentioned before, one of the aims of this framework is to reduce the impact of having teams with little risk-related expertise by adding some automation to the framework. This allows team members, who may not be experts in risk analysis, to participate in the implementation of risk analysis methodologies with the support of the system, thus tackling the challenge depicted in C3. We propose solving this challenge from two perspectives. One the one hand, and depending on the risk methodology selected, some columns may rely on a recommendation system to support the stakeholders. In this sense, recommendation of vulnerabilities, threats, risks or mitigation actions may be provided. In multi-tenant environments, recommendation may be based on the anonymized information collected from other users using a particular tool. All this information will be available in the knowledge database mentioned before.

On the other hand, our framework defines a set of constraints over the movements of the assets represented in the kanban-like board. These constraints depend on the risk analysis process in the background linked to the board. We propose defining a set of rules that constrains what type of actions can be done in a particular step depending on the methodology chosen. Any tool compliant with this framework should include a Movement Approval Module. When a component is moved from one origin column in the Kanban-like board to another target column in the board a query is generated in order to evaluate a condition relating elements of the origin column to the elements generated in previous columns.

Figure 2: Movement Approval Module is activated by each movement of a component between Kanban columns. This module checks conditions and allows or disallows.

Figure 2 describes this process. The movement of a component generates a call to the Movement Approval Module. This module is in charge of accepting or rejecting a drag and drop action. In order to do this, this module takes into account a set of rules that impose restrictions on the risk analysis process. A couple of examples of these rules may be:

  • A component cannot be moved from column to a column where .

  • A component cannot be moved from column to a column where , if there exists an element generated in column that does not have an element generated in column linked to it (e.g. you cannot move a component to the mitigation actions phase if there are threats that do not have risks associated to them while the component was in the risk definition column)

The query generator will generate queries on the storage system to collect the data necessary to validate the conditions imposed by those rules.

When a movement is rejected, the tool may generate different types of feedback including the automatic return of the component to the origin column, a warning message providing explanations that justify why the movement is not legal in that context, a warning icon in the component that provides a justification upon click, etc.

4.4 Additional Aspects

A tool implementing this framework may allow marking some of the elements generated in a column as “deferred”. When an element stored in the system is marked as deferred, it means that it can be omitted by the Movement Approval Module. For instance, a vulnerability of a component is detected but the architect knows that this vulnerability will not be important during the first year of the project. Its analysis can be deferred. Marking the vulnerability as deferred, the system would allow the movement of that particular component with that vulnerability to the following column without blocking the risk analysis of that component.

The proposed framework also allows for tools to include additional support forms to prepare a component to be moved to the next column. These forms may be different and generated ad-hoc depending on the current column of each component and the semantics of that particular column in the mapped risk analysis methodology. For instance, for risk definition we may provide forms to connect each component to risks and scores for evaluating those risks.

5 Risk Assessment for Multi-cloud Applications

Figure 3: Outline of risk assessment process flow. Steps inside the dotted box are performed automatically.

In order to evaluate the proposed framework, we developed a tool to support a risk assessment methodology for multi-cloud application development. A multi-cloud application distributes its components over heterogeneous cloud resources but, from the user’s perspective, it works in an integrated and transparent way. Risk associated with these applications include those of its individual components but also those related to the overall security and to the data communication among its components. In this Section, we will first detail the risk methodology that we have selected for our tool and then describe how it was implemented. In any case, it is worth noting that the risk methodology is explained for illustrative purposes as an example. Our agile risk management framework is agnostic to a particular methodology and other examples with different steps could be mapped in it.

5.1 Selected methodology for the tool

Figure 3 outlines the flow of the selected Risk Assessment methodology, which is compliant with our proposed framework and has been inspired in the OCTAVE methodology Alberts02 . The selected methodology is composed of four main steps to be followed when the team wants to perform the risk assessment of one of the components:

Risk identification. As mentioned above, a knowledge database is used to support the user while identifying the risks. Depending on the type of asset, a subset of the possible risks is shown. The user is then asked to select one or more risks from the knowledge database. The user is also allowed to add new risks to the database if the ones suggested do not cover the specificities of the asset being analyzed.

Risk evaluation. For each risk that the user has selected, an evaluation of the likelihood and potential impact that the risk may have is performed. To evaluate each of the selected risks, the user is required to provide the likelihood and impact of each threat. With this information, the Composite Risk Index (CRI) Banerjee of the risk is evaluated following equation 1. Both likelihood and impact are computed on a scale of 1-9 and the product is quantised on the scale of 1-5. This implies that the CRI ranges from 1 to 25.


Mitigation actions selection. This step allows the user to discover the means to mitigate each of the risks. After evaluating the risk scores, risks are categorised according to their CRI level as those requiring treatment (high and medium risk level) and those that may not require treatment (low risk level). The knowledge database will present the most probable mitigation actions, but the user is free to add any other action from the knowledge base.

Risk status evaluation. Once the selection of the security controls of a risk is complete, the user is asked to select its ROAM status. ROAM Baah is a common agile management risk mitigation classification whose acronym stands for:

  • Resolved - the risk has been answered and avoided or eliminated.

  • Owned - the risk has been allocated to someone who has responsibility for doing something about it.

  • Accepted - the risk has been accepted and it has been agreed that nothing will be done about it.

  • Mitigated - action has been taken so the risk has been mitigated, either reducing the likelihood or reducing the impact.

It is important to note that only risks with status Accepted or Mitigated are considered as fully addressed. Status Owned is treated as a pending status therefore the risk mitigation analysis must continue and Resolved status eliminates the prior risk analysis all together since the threat is considered no longer relevant.

5.2 Implementation of the tool

Once the methodology was clear, a tool was developed to support the development of multi-cloud applications. This tool was also developed to address the prior challenges identified.

As we described in the previous section, our risk assessment framework relies on three sources of information:

  • The definition of the application assets: in this case, the assets that the tool will consider will be components of the architecture of the application. Components may range from small components of the architecture in the form of specific purpose libraries running on premises to complex and general components in the architecture including sub-components or complex services offered by cloud service providers, by devices in an IoT ecosystem, etc. In the area of cloud applications, there have been many attempts at defining a domain-specific language (DSL) that can describe cloud applications Moran ; CloudView ; CloudML . It is worth mentioning that the OASIS technical committee called TOSCA (Topology and Orchestration Specification for Cloud Applications) is developing an open standard that provides a language to describe cloud components and their relationships TOSCA . In our case we have chosen CAMEL (Cloud Application Modelling and Execution Language) CAMEL , a DSL akin to TOSCA that allows users to specify multiple aspects of cross-cloud applications, such as provisioning and deployment, service-level objectives, metrics, scalability rules, providers, security controls, execution contexts, and execution histories. Using CAMEL, the development team is able to describe the architecture and the deployment requirements with a high-level of abstraction and independently of any cloud provider.

  • The set of rules that the team has selected: for simplicity, we have not established roles and every user has the same responsibility over the risk assessment process. The rules that have been incorporated to the tool are: (i) A component cannot be moved from column to a column where ; (ii) a component cannot be moved to the Mitigation actions selection unless all the risks have been evaluated and their CRI calculated; (iii) a component cannot be moved to the Evaluation column unless all the risks have at least one security control; and (iv) the risk analysis of a component cannot be considered as fully addressed unless all the risks have been accepted or mitigated.

  • The knowledge database: in order to assess the risks, in our tool we use a risk model based on the OWASP risk modelling OWASP and we gather information from different sources, such as the OWASP TOP 10 threats catalogue OWASPTop10 or NIST SP 800-53 r4 NIST . This knowledge database is based on a predefined set of possible risks (here called threats) and a matching set of mitigation actions (here called security controls) which needs to be fulfilled by the application designer. Each of these security controls come with the definition and measuring technique on how the security control should be fulfilled.

Once the architecture is ready, the components to be analyzed from a risk perspective are imported into the Kanban. It is important to note that we have added an initial state for those components for which we have not started analyzing risks and a final state for those components for which the risk analysis is finished. Also, we have decided to combine the four steps of our methodology into two steps: identification and evaluation of risks, and selection and evaluation of mitigation actions. Thus, the four states and columns that our Kanban offers are:

  • Components definition, which is the initial step for all the components pending their risk assessment.

  • Risks definition, where the users would move the components to start the risk assessment. In this step, the users are asked to decide the risks that affect the component. Moreover, in this step the users are also asked to evaluate the likelihood and impact of the risks.

  • Security controls definition, where the users are presented with the possible security controls of each risk depending on the CRI. Once the users select the security controls, they are also asked to apply ROAM to the risks.

  • Validation, for all the components that have finalized the risk assessment. Only those components whose risks have been Accepted or Mitigated should be in this state. The last step required from the user is the acceptance of the level of the risk mitigation status. The Validation step provides an overview summary of the choices made in previous steps.

5.2.1 Risks definition

In this step, the user chooses the threats that the component under consideration is susceptible to. Once threats are selected, they are automatically classified in the STRIDE STRIDE categories (Spoofing identity, Tampering, Repudiation, Information disclosure, Denial of service and Elevation of privilege).

Regarding the evaluation of the risks, the likelihood and consequence scales chosen are inspired from STRIDE . For simplification, CRI is also provided as an option for the user to provide likelihood and impact for each of the STRIDE categories, and the same scores are applied to all the threats categorised under each of the 6 categories of STRIDE). In our risk assessment process, the Likelihood and Impact values are further computed from a set of categorisations-based influencers taken from OWASP approach to the CRI. These influencers simplify the process and include concepts both from a technical perspective (ease of exploit, skill level of the threat agents, etc.) and the business perspective (financial damage, reputation damage, etc.). These sub-values are grouped by the type of factors and represented by the value in a scale of 0-9 where 0 represents a very unlikely scenario and in contrast 9 represents a very high likelihood of the factor to occur. Detailed description of all the factors can be found in OWASP . Most of the Impact factors are pre-populated with values based on our threat catalogue.

In our tool, selecting the Details button for a component in the STRIDE based Risk Assessment column would bring up the screen in Figure 4. This shows the STRIDE Risk Assessment process and the actions that need to be followed to handle this assessment. Figure 4 shows several rows representing the STRIDE defined threat categories, for example Tampering, Information Disclosure, Denial of Service. It also shows the likelihood and impact specification using the OWASP guidelines to compute likelihood and impact based on Threat Agent factors, Vulnerability factors and both technical and business impact factors. As many of these areas that are relevant have been completed, the user can move to the next stage of the risk assessment.

Figure 4: STRIDE based Risk Assessment step

5.2.2 Security controls definition

Within the cloud security arena, this can be done by selecting the security controls the provider needs to guarantee in order to mitigate the threat. As indicated before, NIST SP 800-53 r4 NIST maps security controls to the threats and indicates the threat levels that require treatments. Based on this mapping, the required controls are obtained for the threats selected by the users. These controls are then presented to the user as suggestions but, as mentioned before, the user is free to extend the choice to all the available security controls if desired. Selected controls are further mapped to the CCM (Cloud Control Matrix) controls from Cloud Security Alliance (CSA) CSA .

Figure 5 shows the contents of a component that is in this stage. As we can see, in addition to the security controls, the user is also asked to apply ROAM to the risks.

Figure 5: Security Controls Definition step

We should keep in mind that we should repeat the whole risk analysis process for any possible risk that we may detect for a given component, and that this process should be iterated as many times as dictated by the iterations of the development process. Consequently, components that are in the Validation state may be moved to previous states if necessary. It is important to note that in any state, the user can request the tool to output a report detailing the risk assessment status of each component, including the risks identified, their status, and their security controls.

6 Results

Research into adoption of risk management as an integral part of Agile requires both qualitative and quantitative analysis. A survey can help us gather quantitative figures while a qualitative approach using an intensive case study is an effective way of understanding a team’s approach. Case studies not only deliver insight into the thinking and ideas of a development team but allows their actions, behaviour and body language to be observed and recorded for analysis. A combination of both approaches compliment each other and will facilitate further analysis and development of relevant hypotheses. To address this objective, we will firstly introduce the case studies and analyze the selected team of evaluators before presenting the results of the evaluation obtained through surveys. Then, we will also discuss the opinions of the evaluators regarding how relevant risk management is when compared to other tools used to develop multi-cloud applications.

6.1 Use cases description

For this evaluation, two different real case studies were chosen: an urban smart mobility service and an airline flight scheduling system.

The smart mobility application should provide efficient and optimal route planning by considering road, traffic, energy consumption, and weather conditions. The urban mobility service was proposed to have 4 components: the smart mobility engine that would serve as orchestrator, the consumption estimator that would calculated the energy needed on each trip, the multi-modal journey planner that would offer the optimal trip, and the database.

Airline scheduling is a complex scenario since each airline must react to actions of the rest in order to keep the schedules up to date. The flight scheduling system was proposed as an application with 5 components: the central gateway that would serve as entry point, the read module that would query fleet and airline-related information, the write module that would update fleet and airline-related information, the web interface that the final user would interact with, and a set of additional cloud services, such as event managers or databases.

In both use cases the objective was to provide a distributed solution that could reduce the points of failure and offer greater flexibility. Moreover, these application may also become Platform-as-a-service solutions that could scale or do load balancing as needed. With this plan, a user-centered evaluation was performed to assess if the tool developed fulfilled the needs of the users.

6.2 Evaluators team

To evaluate our methodology and our tool, we selected a group of evaluators from Lufthansa Systems and Tampere University of Technology. We were looking for evaluators that were not familiar with the tool, who were experienced defining architectures with CAMEL, who had a limited level of expertise on security, and who covered different levels of expertise in cloud development. Figure 6

shows different information of the 9 evaluators. The figure shows how the job positions are distributed, what was their relation to cloud-based application development, and how familiar they were with risk assessment. We can see how most of the evaluators were partly familiar or very familiar with the development of cloud applications, whereas most of them were only slightly familiar with risk assessment. In this sense, according to evaluators’s experience, risk management followed a quite rudimentary approach, without a formal or systematic approach to evaluate risk and tackle attack vectors. In the case of Lufthansa Systems, previous developments were for internal use and risks were not a priority. In general, risks were managed during the early planning phases (and in most of the cases only once) as it was considered as a one-time activity rather than a continuous process. The applied tool for this exercise was usually MS Excel, and in most of the cases there was only one person responsible for filling out the sheets with mostly NFR related risks. For Tampere University of Technology, focus was put on solving the functional requirements of the application. When having meetings with potential customers, they gathered their security concerns as a checklist of non-functional requirements. Then these were ordered by priority and tackled with security controls. In this sense, risk management was mostly done on the fly.

Finally, and although not shown in those figures, it is also worth noting that all the evaluators considered themselves as having a good knowledge of Agile.

(a) Distribution of job positions.
(b) Familiarity with cloud-based application development.
(c) Familiarity with risk assessment.
Figure 6: Analysis of the group of evaluators.

6.3 Use cases evaluation results

For the evaluation we divided the evaluation group into 2 teams. Each team was asked to consider a different use case and then we collected their feedback in the form of a survey.

Figure 7 presents the answers given by the evaluators to different questions about the proposed risk analysis tool. According to the evaluators, the tool supports DevOps collaboration and it is efficient in the security risks definition aligned with the application security requirements. We can also see how the majority of the evaluators agree that the tool supports the agile management of multi-cloud applications. This shows that our proposed tool can help mitigate three of the challenges (C1, C2, and C4) presented above related to the lack of agile tools, the need for continuous risk assessment, and to the lack of collaboration. Given that the evaluators consider the tool to be easy to use and the supported process to help the risk analysis process, we also consider that our tool helps minimizing challenge C3, since it can help alleviate the lack of security expertise in the teams. The evaluators considered that the output of the tool is easy to understand but although the tool is easy to use, they believe that the messages and tooltips shown were sometimes confusing. We consider that some future work could be dedicated here to improve the understandability of the output of the tool.

Figure 7: Results of the evaluation.

In general, the evaluators agree that the tool achieves its objectives and that it allows the definition of all the security threats and the security controls of a multi-cloud application. Table 1 presents some figures regarding efficiency collected during the evaluation. The timing results indicated by the evaluation teams are aligned with the values obtained in internal tests performed in continuous evaluation by the tool developers. Scenario 2 was more complex, as we can conclude seeing that times are larger and that the number of controls supported was lower. As a general rule, the estimated time saved by using the tool is a very complex question. The evaluators could not give any estimation on that, because many of them did not perform any risk assessment before, although they do agree and recognize the benefit of using the tool. More experiments should be performed in the future to actually compare time using a traditional risk assessment tool, to be able to provide a more accurate answer.

Efficiency questions Scenario1 Scenario2
Avg. Median Avg. Median
Time spent defining the risks of one component (minutes) 23 30 24.8 26
Time spent defining the risks of the whole application (minutes) 82 90 143 90
Estimated time saved (by using the tool) defining the risks of the application (minutes) 185 60 247,5 195
Number of required security controls that could be specified in the risk analysis 67 86 36.5 11.5
Table 1: Efficiency questions related to risk assessment of a multi-cloud application.

Finally, it is worth mentioning that some evaluators commented that perhaps the risk analysis should be done before modelling the application. Our methodology requires an initial model to offer a first analysis of risks but, obviously, it supports iterating as many times as possible between model definition and risk analysis as needed.

6.4 Risk analysis relevance

Finally, in order to understand the relevance of managing risks when creating a multi-cloud application, we asked evaluators to rank our risk management tool with respect to other tools to build and secure multi-cloud applications, both in terms of importance and innovation. This list of tools is based on the tools proposed in Rios2015TowardsSM , where the authors propose a framework to support the security-intelligent lifecycle management of distributed applications over heterogeneous cloud resources. After gathering the results, the ordered list of tools was:

  1. a risk analysis tool

  2. a security assurance platform for monitoring

  3. a decision support system for Cloud Service Selection

  4. an Service Level Agreement (SLA) Generator

  5. a deployer to support distributed deployment

As we can see, the risk analysis tool is the one valued highest, where 4 out of the 9 evaluators ranked it as their first choice.

7 Conclusions

Risk assessment is often an afterthought, as it happens with security as a whole. Risk assessment is usually performed in a quick and unstructured manner or even completely skipped. As a consequence, risk assessment often leads to ineffective and inaccurate analysis.

Considering risk management properly in the agile development process generates a number of challenges to be solved. This paper combines the information collected from previous works together with years of internal experience to describe several pending challenges for risk management. The proposed challenges also cover one area that is often neglected: team cultural change. By creating tools that support the aforementioned challenges we will enable faster and more comprehensive adoption of agile risk management tools and techniques.

In order to cover the challenges identified in this paper, we have proposed a framework that is based on an online Kanban-like tool that is agile (challenge C1) and fosters collaboration (challenge C4) by offering a visual representation of the proposed risks/threats and their related mitigation actions. This framework proposes using recommendations and rules to offer automation and guidance to the team. This makes risk assessment attainable and usable even by software designers that have good technical skills but may not be security and risk analysis experts (challenge C3). Moreover, since the knowledge database can be tailored to the specific domain of the application, a finer granularity in the risks and mitigation actions suggested can also be achieved. Our proposal allows the natural introduction of new risks and threats and re-evaluation of the level of security related to each system component following continuous software delivery methods (challenge C2).

In this paper, we have also described an implementation of the framework in order to perform a user evaluation. From the results of this evaluation we can conclude that an agile risk analysis tool is one of the main tools needed to develop a secure multi-cloud application. Moreover, our tool and selected methodology received a very positive feedback from the evaluations and were able to satisfy the needs to tackle all challenges.

For future work, apart from some GUI glitches, for future versions we would like to improve the scalability of the tool by offering operations that can affect multiple components at the same time or that can allow the user to add or remove groups of security controls in one operation. We would then like to perform a new evaluation in order to assess the impact of our changes and to better measure how this methodology improves past techniques and how cultural change can be pushed via an agile tool. Another important line of future work is adding automation, so that the tool can learn from the users’ past behaviour and proactively suggest actions or add risks for components similar to those that the user has analysed in the past. In this sense, developing a tool that could automatically suggest risks from the definition of the application components would boost the impact on teams with little risks-related experience. From a different perspective, we also plan to apply this methodology and develop a similar tool to handle risks associated to software development planning.


This work is supported by the European Commission through the ENACT project under Project ID: 780351 and the PDP4E project under Project ID: 787034.


  • [1] J. McKendrick. Every company now a technology company: Latest round of mergers and acquisitions confirms it.
  • [2] Peter Mell and Tim Grance. The NIST definition of cloud computing. National Institute of Standards and Technology, 53(6):50, 2009.
  • [3] Gartner. Gartner reveals top predictions for it organizations and users in 2018 and beyond., 2017.
  • [4] Northbridge. 2016 future of cloud computing study., 2016.
  • [5] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union, L119:1–88, May 2016.
  • [6] Zheng Yan, Peng Zhang, and Athanasios V Vasilakos. A survey on trust management for internet of things. Journal of network and computer applications, 42:120–134, 2014.
  • [7] Aida Omerovic. Supporting cloud service selection with a risk-driven cost-benefit analysis. In Antonio Celesti and Philipp Leitner, editors, Advances in Service-Oriented and Cloud Computing, pages 166–174, Cham, 2016. Springer International Publishing.
  • [8] Smrati Gupta, Victor Muntes-Mulero, Peter Matthews, Jacek Dominiak, Aida Omerovic, Jordi Aranda, and Stepan Seycek. Risk-driven framework for decision support in cloud service selection. 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 00:545–554, 2015.
  • [9] Boehm and R. Turner. Balancing Agility and Discipline: A Guide for the Perplexed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003.
  • [10] F. Paetsch, A. Eberlein, and F. Maurer. Requirements engineering and agile software development. In 12th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, pages 308–313, June 2003.
  • [11] J. Medeiros, A. Vasconcelos, M. Goulão, C. Silva, and J. Araújo. An approach based on design practices to specify requirements in agile projects. In 32nd ACM Symposium on Applied Computing, SAC 2017, Marrakesh, Morocco, April 2017.
  • [12] E. Wallmüller. Business continuity. chapter Risk Management for IT and Software Projects, pages 165–178. Springer-Verlag New York, Inc., 2002.
  • [13] L. M. Rao and S. Firdose. Study of existing risk management models and prior research contribution. Adarsh Journal of Information Technology, 4(1):10–20, 2016.
  • [14] B. Karabacak and I. Sogukpinar. Isram: Information security risk analysis method. Comput. Secur., 24(2):147–159, March 2005.
  • [15] C. J. Alberts and A. Dorofee. Managing Information Security Risks: The Octave Approach. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002.
  • [16] M. S. Lund, B. Solhaug, and K. Stølen. Model-driven risk analysis: the CORAS approach. Springer Science & Business Media, 2010.
  • [17] The STRIDE Threat Model. Accessed: 2017-05-09.
  • [18] OWASP Foundation. Technical report, 2013. [Accessed online 05-11-2013].
  • [19] M. Howard and S. Lipner. The Security Development Lifecycle. Microsoft Press, Redmond, WA, USA, 2006.
  • [20] M. Pasha, G. Qaiser, and U. Pasha. A critical analysis of software risk management techniques in large scale systems. IEEE Access, 6:12412–12424, 2018.
  • [21] F. Ramos, A. Costa, M. Perkusich, H. Almeida, and A. Perkusich. A non-functional requirements recommendation system for scrum-based projects. In

    30th International Conference on Software Engineering & Knowledge Engineering

    , SEKE, 2018.
  • [22] D. Domah and F. J. Mitropoulos. The nerv methodology: A lightweight process for addressing non-functional requirements in agile software development. In SoutheastCon 2015, pages 1–7, April 2015.
  • [23] W. M. Farid. The normap methodology: Lightweight engineering of non-functional requirements for agile processes. In 2012 19th Asia-Pacific Software Engineering Conference, volume 1, 2012.
  • [24] A. Moran. Applying Agile Risk Management, pages 61–85. Springer International Publishing, Cham, 2014.
  • [25] A. Moran. Agile Risk Management. Springer International Publishing, 2014.
  • [26] J. Feller, B. Fitzgerald, S. A. Hissam, and K. R. Lakhani, editors. Perspectives on Free and Open Source Software, Cambridge, July 2005. The MIT Press Ltd.
  • [27] M. Michlmayr, B. Fitzgerald, and K. J. Stol. Why and how should open source projects adopt time-based releases? IEEE Software, 32(2):55–63, Mar 2015.
  • [28] Daniel Ståhl and Jan Bosch. Modeling continuous integration practice differences in industry software development. J. Syst. Softw., 87:48–59, January 2014.
  • [29] B. Fitzgerald and K.-J. Stol. Continuous software engineering: A roadmap and agenda. Journal of Systems and Software, 123:176–189, 2017.
  • [30] D. Ameller, C. Farré, X. Franch, D. Valerio, and A. Cassarino. Towards continuous software release planning. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 402–406, Feb 2017.
  • [31] S. Ambler. When does(n’t) agile modeling make sense., 2001. Accessed: 2017-05-06.
  • [32] B. Fitzgerald, G. Hartnett, and K. Conboy. Customising agile methods to software practices at intel shannon. Eur. J. Inf. Syst., 15(2):200–213, April 2006.
  • [33] B. Fitzgerald, K. Stol, R. O’Sullivan, and D. O’Brien. Scaling agile methods to regulated environments: An industry case study. In International Conference on Software Engineering, ICSE ’13, pages 863–872. IEEE Press, 2013.
  • [34] V. Mudumba and O. K. Lee. A new perspective on gdsd risk management: Agile risk management. In 2010 5th IEEE International Conference on Global Software Engineering, pages 219–227, Aug 2010.
  • [35] S. V. Shrivastava and U. Rathod. A risk management framework for distributed agile projects. Information and Software Technology, 85:1 – 15, 2017.
  • [36] A. Aslam, N. Ahmad, T. Saba, A. S. Almazyad, A. Rehman, A. Anjum, and A. Khan. Decision support system for risk assessment and management strategies in distributed software development. IEEE Access, 5:20349–20373, 2017.
  • [37] M. Merkow and L. Raghavan. An ecosystem for continuously secure application software. RUGGED Software, CrossTalk March/April, 2011.
  • [38] Aaron Banerjee. Equivalence of risk: A mathematical approach. In Proceedings of the 29th International System Safety Conference, Las Vegas, NV, pages 8–12, 2011.
  • [39] A. Baah. Agile Quality Assurance. Bookbaby, 2017.
  • [40] D. Morán, L. M. Vaquero, and F. Galán. Elastically ruling the cloud: Specifying application’s behavior in federated clouds. In 2011 IEEE 4th International Conference on Cloud Computing, pages 89–96, July 2011.
  • [41] D. Zhou, L. Zhong, T. Wo, and J. Kang. Cloudview: Describe and maintain resource view in cloud. In 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pages 151–158, Nov 2010.
  • [42] G. Goncalves, P. Endo, M. Santos, D. Sadok, J. Kelner, B. Melander, and J. E. Mangs. Cloudml: An integrated language for resource, service and request description for d-clouds. In 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pages 399–406, Nov 2011.
  • [43] OASIS. Topology and Orchestration Specification for Cloud Applications (TOSCA).
  • [44] CAMEL. Cloud Application Modelling and Execution Language.
  • [45] OWASP Risk Rating Methodology. Accessed: 2018-05-09.
  • [46] The OWASP Top 10. Accessed: 2018-05-09.
  • [47] Security and Privacy Controls for Federal Information Systems and Organizations NIST Special Publication 800-53 Revision 4.
  • [48] Cloud Security Alliance. Cloud Control Matrix.
  • [49] Erkuden Rios, Eider Iturbe, Leire Orue-Echevarria Arrieta, Massimiliano Rak, and Valentina Casola. Towards self-protective multi-cloud applications - musa - a holistic framework to support the security-intelligent lifecycle management of multi-cloud applications. In CLOSER, 2015.