A Formal Development Cycle for Security Engineering in Isabelle

In this paper, we show a security engineering process based on a formal notion of refinement fully formalized in the proof assistant Isabelle. This Refinement-Risk Cycle focuses on attack analysis and security refinement supported by interactive theorem proving. Since we use a fully formalized model of infrastructures with actors and policies we can support a novel way of formal security refinement for system specifications. This formal process is built practically as an extension to the Isabelle Infrastructure framework with attack trees. We define a formal notion of refinement on infrastructure models. Thanks to the formal foundation of Kripke structures and branching time temporal logic in the Isabelle Infrastructure framework, these stepwise transformations can be interleaved with attack tree analysis thus providing a fully formal security engineering framework. The process is illustrated on an IoT healthcare case study introducing GDPR requirements and blockchain.

Authors

• 9 publications
12/08/2021

Dependability Engineering in Isabelle

In this paper, we introduce a process of formal system development suppo...
12/29/2021

Explanation by Automated Reasoning Using the Isabelle Infrastructure Framework

In this paper, we propose the use of interactive theorem proving for exp...
05/16/2019

Making Agile Development Processes fit for V-style Certification Procedures

We present a process for the development of safety and security critical...
03/26/2020

Applying the Isabelle Insider Framework to Airplane Security

Avionics is one of the fields in which verification methods have been pi...
03/17/2018

Attack Trees in Isabelle -- CTL semantics, correctness and completeness

In this paper, we present a proof theory for attack trees. Attack trees ...
09/25/2020

Integration of Formal Proof into Unified Assurance Cases with Isabelle/SACM

Assurance cases are often required to certify critical systems. The use ...
10/18/2020

RBAC for Healthcare-Infrastructure and data storage

Role based Access control (RBAC) is the cornerstone of security for any ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Security is a notoriously difficult property for system development because it is not compositional: given secure components, a system created from those components is not necessarily secure. Therefore, the usual divide-and-conquer approach from system and software design does not apply for security engineering. At the same time, it is mandatory for the design of secure systems to introduce security in the early phases of the development since it cannot be easily “plugged in” at later stages. However, even if security is introduced in early phases, a classical stepwise development of refining abstract specifications by concretizing the design does not preserve security properties. Take for example, the implementation of sending a message from a client A to a server B such that the communication is encrypted to protect its content. In the abstract system specification we do not consider a concrete protocol nor the architecture of the client and server. Using common refinement methods from software engineering provides a possible implementation by passing the message from a client system AS connected by a secure channel to a server system BS. However, this implementation does not exclude that other processes running on either AS or BS can eavesdrop on the cleartext message because the confidentiality protection is only on the secure channel from AS to BS. This example is a simple illustration of what is known as the security refinement paradox [12]. Why is security so hard? A simple explanation is that it talks about negative properties: something (loss or damage of information or functionality) must not happen. Negation is also in logic a difficult problem as it needs exclusion of possibilities. If the space to consider is large, this proof can be hard or infeasible. In security, the attacks often come from “outside the model”. That is, for a given specification we can prove some security property and yet an attack occurs which uses a fact or observation or loophole that just has not been considered in the model. This known practical attack problem is similar to the refinement paradox. Intuitively, the attacker exploits a refinement of the system that has not been taken into account in the specification but is actually part of the real system (an implementation of the specification). In the above example, the real system allows that other applications can be run on the client within the security boundary. This additional feature of multi-processing systems has not been taken into account in the abstract specification in the above example where we considered processes and systems – the client and the server – as abstract entities without distinguishing the features of their internal architectures.

Sadly, 100% security is not achievable, therefore, the next best thing is to find ways to gradually improve the security of a system. This should ideally be done at design time since a specification can be changed while changing a system is expensive. A pragmatic way of engineering secure systems is to pursue the identification of security goals as part of a security requirements engineering process while complementing this with the establishment of an attacker model. To this end, we propose a formal process of system development that focuses on attack analysis and security refinement in one integrated interactive process interleaving system development with attack analysis using attack trees [34]. To support this ambitious goal, the current paper pulls together strands of previous works, for the first time succeeding in combining attack analysis and system refinement in one consistent automated framework illustrating it on a complex case study. The paper presents a generic theory of refinement in Isabelle that manifests linking the notion of attack trees with state based system refinement based on a Kripke semantics and temporal logic. The security refinement is illustrated on an IoT healthcare example that is entirely formalized in Isabelle on top of the underlying theory.

The contributions of this paper are: (a) we present a fully formalized process of security refinement for infrastructure systems using a notion of refinement and derive useful theory for it. The resulting Refinement-Risk Cycle integrates formal system development with risk analysis by attack trees; (b) we illustrate the process by showing the development of an IoT healthcare system from the CHIST-ERA project SUCCESS [7] exhibiting security attacks and formally refining the system specification step-by-step. An earlier workshop paper [17], already introduced the Refinement-Risk Cycle but only informally exhibiting the IoT healthcare example. The current paper subsumes this workshop paper by defining a formal process of refinement. It thus formalizes a process of security engineering. Interestingly, although the system itself had also been formalised in the informal precursor [17], the earlier specification contains subtle design errors that make a secure refinement impossible. These errors have now been identified by applying the formal refinement that constitutes the fundamental core of the Refinement-Risk cycle. Thus, the current addition of the foundation of the Refinement-Risk cycle is a contribution that not only largely extends [17] but by scrutinizing the example establishes the validity of the security refinement and proves its valor. As a further additional contribution, we finally present the analysis of the error correction.

In the remainder of this section we briefly summarize the underlying Isabelle Infrastructure framework including Kripke structures, CTL and Attack Trees that we used as the foundation for the current work. A detailed account is contained in the Appendix. Next, we present the Refinement-Risk-Cycle (RR-Cycle) and in particular the formal notion of refinement that allows security refinement (Section II). We then illustrate this process on the application to the case study first giving an overview and initial model (Section III) before providing technical details of the cycle’s application by the stepwise system refinement steps triggered by attacks (Section IV). We finally present the design errors that could be identified (Section IV-E) before we conclude in Section V. All developments and the application to the case study are formalised in Isabelle. The sources are available online [18].

I-a Kripke structures, CTL, and Attack Trees

Figure 1 gives an overview of the Isabelle Infrastructure framework with its layers of object-logics – each level below embeds the one above showing the novel contribution of this paper in colours on the top.

In the course of various extensions (detailed in the Appendix), the Isabelle framework has been restructured such that it is now a general framework for the state-based security analysis of infrastructures with policies and actors. Temporal logic and Kripke structures build the foundation. Meta-theoretical results have been established to show equivalence between attack trees and CTL statements [14]. This foundation provides a generic notion of state transition on which attack trees and temporal logic can be used to express properties. The main notions used in this paper are:

• Kripke structures and state transitions:
Using a generic state transition relation , Kripke structures are defined as a set of states reachable by from an initial state set, for example

Kripke {t. $$\exists$$ i $$\in$$ I. i $$\to^{*}$$ t} I


• CTL statements:
For example, we can write

K $$\vdash$$  EF s

to express that in Kripke structure K there is a path on which the property s (a set of states) will eventually hold.

• Attack trees:
The datatype of attack trees has three constructors: creates or-trees and creates and-trees. And-attack trees and or-attack trees consist of a list of sub-attacks – again attack trees. The third constructor creates a base attack as a pair of state sets written . For example, a two step and-attack leading from state set I via si to s is expressed as

$$\vdash$$ [$${\mathcal{N}}_{\texttt{(I,si)}}$$,$${\mathcal{N}}_{\texttt{(si,s)}}$$]$$\oplus_{\wedge}^{\texttt{(I,s)}}$$


• Attack tree refinement, validity and adequacy:
Attack trees have their own refinement (not to be mixed up with the model transformation presented in this paper). An abstract attack tree may be refined by spelling out the attack steps until a valid attack is reached: A :: (:: state) attree). The validity is defined constructively (code is generated from it) and its adequacy with respect to a formal semantics in CTL is proved and can be used to facilitate actual application verification as demonstrated her in the stepwise system refinements.

In this paper, we present an extension of this formal process introducing refinement of Kripke structures. It refines a system model based on a formal definition of a combination of trace refinement and structural refinement. The definition allows to prove property preservation results crucial for an iterative development process. The refinements of the system specification can be interleaved with attack analysis while security properties can be proved in Isabelle. In each iteration security qualities are accumulated while continuously attack trees scrutinize the design.

Ii The Refinement-Risk-Cycle for Secure IoT System

We first introduce the iterative process of refinement and attack tree analysis (the “Refinement-Risk-Cycle”) providing an overview followed by the formal definition in Isabelle and the resulting property preservation.

Ii-a Overview of RR-Cycle

As an initial step, the Fusion/UML method serves to develop a system architecture from early requirements. This system architecture is translated into the Isabelle Infrastructure framework: actors in UML become Isabelle Infrastructure actors, UML system classes are represented by locations in the infrastructure graph, and the class attributes and pre- and postconditions of methods are formalised in the local and global policies. The identification of attacks, using for example invalidation [23], can then reveal paths of state transitions through the system model where the global security policy is violated. In an iteration, these attack paths provide details useful for refining the system specification by adding security controls, for example, access control, privacy preservation, or blockchain. The addition of detail, however, may in turn introduce new vulnerabilities that lead to new iterations of the process. Security properties may be proved at each level of the iteration. They are true for this abstraction level of the system model and remain true in the refined system. However, new attacks may be found despite proved security. The Refinement-Risk Cycle process is graphically depicted in Figure 2.

Ii-B Refinement

Intuitively, a refinement changes some aspect of the type of the state, for example, replaces a data type by a richer datatype or restricts the behaviour of the actors. The former is expressed directly by a mapping of datatypes, the latter is incorporated into the state transition relation of the Kripke structure that corresponds to the transformed model. In other words, we can encode a refinement within our framework as a relation on Kripke structures that is parametrized additionally by a function that maps the refined type to the abstract type. The direction “from refined to abstract” of this type mapping may seem curiously counter-intuitive. However, the actual refinement is given by the refinement that uses this function as an input. The refinement then refines an abstract to a more concrete system specification. The additional layer for the refinement can be formalised in Isabelle as a second111The first refinement relation in this framework is on attack trees summarized in Section A-C. refinement relation . The relation mod_trans is typed as a relation over triples – a function from a threefold Cartesian product to bool, the type containing true and false only. The type variables and input to the type constructor Kripke represent the abstract state type and the concrete state type. Consequently, the middle element of the triples selected by the relation mod_trans is a function of type mapping elements of the refined state to the abstract state. The expression in quotation marks after the type is again the infix syntax in Isabelle that allows the definition of mathematical notation instead of writing mod_trans in prefix manner. This nicer infix syntax is already used in the actual definition. Finally, the arrow is the implication of Isabelle’s meta-logic while is the one of the object logic HOL. They are logically equivalent but of different types: within a HOL formula , for example, as below , only the implication can be used.

 mod_trans ::  ($$\sigma$$ Kripke $$\times$$ ($$\sigma$$’ $$\Rightarrow$$ $$\sigma$$) $$\times$$ $$\sigma$$’ Kripke)
$$\Rightarrow$$ bool                  ("_ $$\sqsubseteq_{(\_)}$$ _")
K $$\sqsubseteq_{\mathcal{E}}$$ K’ $$\equiv$$ $$\forall$$ s’ $$\in$$ states K’. $$\forall$$ s $$\in$$ init K’.
s $$\to_{\sigma^{\prime}}^{*}$$ s’ $$\longrightarrow$$ $$\mathcal{E}$$(s) $$\in$$ init K
$$\land$$ $$\mathcal{E}$$(s) $$\to_{\sigma}^{*}$$ $$\mathcal{E}$$(s’)

The definition of K   K’ states that for any state of the refined Kripke structure that can be reached by the state transition in zero or more steps from an initial state of the refined Kripke structure, the mapping from the refined to the abstract model’s state must preserve this reachability, i.e., the image of must also be an initial state and from there the image of under must be reached with or steps.

Ii-C Property Preserving System Refinement

A first direct consequence of this definition is the following lemma where the operator in (init K’) represents function image, that is the set, .

lemma init_ref: K $$\sqsubseteq_{\mathcal{E}}$$ K’ $$\Longrightarrow$$ $$\mathcal{E}$$$$\triangleleft$$(init K’) $$\subseteq$$ init K

A more prominent consequence of the definition of refinement is that of property preservation. Here, we show that refinement preserves the CTL property of which means that a reachability property true in the refined model K’ is already true in the abstract model. A state set represents a property in the predicate transformer view of properties as sets of states. The additional condition on initial states ensures that we cannot “forget” them.
theorem prop_pres:
K $$\sqsubseteq_{\mathcal{E}}$$ K’  $$\Longrightarrow$$ init K $$\subseteq$$ $$\mathcal{E}$$$$\triangleleft$$(init K’) $$\Longrightarrow$$
$$\forall$$ s’ $$\in$$ Pow(states K’). K’ $$\vdash$$  EF s’
$$\longrightarrow$$ K $$\vdash$$  EF ($$\mathcal{E}$$$$\triangleleft$$(s’))

It is remarkable, that our definition of refinement by Kripke structure refinement entails property preservation and makes it possible to prove this as a theorem in Isabelle once for all, i.e., as a meta-theorem. However, this is due to the fact that our generic definition of state transition allows to explicitly formalise such sophisticated concepts like reachability. For practical purposes, however, the proof obligation of showing that a specific refinement is in fact a refinement is rather complex justly because of the explicit use of the transitive closure of the state transition relation. In most cases, the refinement will be simpler. Therefore, we offer additional help by the following theorem that uses a stronger characterisation of Kripke structure refinement and shows that our refinement follows from this.
theorem strong_mt:
$$\mathcal{E}$$$$\triangleleft$$(init K’) $$\subseteq$$ init K $$\land$$ s $$\to_{\sigma^{\prime}}$$ s’ $$\longrightarrow$$ $$\mathcal{E}$$(s) $$\to_{\sigma}$$ $$\mathcal{E}$$(s’)
$$\Longrightarrow$$ K $$\sqsubseteq_{\mathcal{E}}$$ K’

This simpler characterisation is in fact a stronger one: we could have in the refined Kripke structure K’ and but neither nor are reachable from initial states in K’. For cases, where we want to have the simpler one-step proviso but still need reachability we provide a slightly weaker version of strong_mt.
theorem strong_mt’:
$$\mathcal{E}$$$$\triangleleft$$(init K’) $$\subseteq$$ init K $$\land$$ ($$\exists$$ s0 $$\in$$ init K’. s0  $$\to^{*}$$ s)
$$\land$$ s $$\to_{\sigma^{\prime}}$$ s’ $$\longrightarrow$$ $$\mathcal{E}$$(s) $$\to_{\sigma}$$ $$\mathcal{E}$$(s’) $$\Longrightarrow$$ K $$\sqsubseteq_{\mathcal{E}}$$ K’


This idea of property preservation coincides with the classical idea of trace refinement as it is given in process algebras like CSP. In this view, the properties of a system are given by the set of its traces. Now, a refinement of the system is given by another system that has a subset of the traces of the former one. Although the principal idea is similar, we greatly extend it since our notion additionally incorporates refinement. Since we include a state map in our refinement map, we additionally allow structural refinement: the state map generalises the basic idea of trace refinement by traces corresponding to each other but allows additionally an exchange of data types. As we see in the application to the case study, the refinement steps may sometimes just specialise the traces: in this case the state map is just identity.

Iii Applying RR-Cycle to IoT Healthcare Example

We now first give a tabular overview of the steps taken for the case study. Following the RR-Cycle, we have modelled and analysed the IoT healthcare application in four iterations summarised in the table in Figure 3.

How each of these models is refined in each iteration, as well as the attack trees that exhibit vulnerabilities, is discussed in the following sections as indicated in the last column of the table in Figure 3.

Iii-a Initial Step: Fusion/UML for System Architecture

The Fusion/UML process for object oriented design and analysis has been used to derive a system design for the application scenario. For reasons of conciseness, we omit here the details presenting just one of the main outcomes of the analysis process: the system class model as depicted in Figure 4. Note that, within the security perimeter, only the cloud server and the connected hospital (or other client institutions) are situated. The smartphone and the home server feature as data upload devices and the smartphone additionally as a control device that is included in some of the use cases. This is a consequence of the GDPR [15] requirements which are thus immediately settled in the initial architecture.

Another result of the Fusion/UML analysis along with this system architecture is a set of operation schemas based on the system class model, additional use cases and object collaborations. For details see [22].

Iii-B Infrastructures, Policies, and Actors

The Isabelle Infrastructure framework supports the representation of infrastructures as graphs with actors and policies attached to nodes. These infrastructures are the states of the Kripke structure.

The transition between states is triggered by non-parametrized actions get, move, eval, and put executed by actors. Actors are given by an abstract type actor and a function Actor that creates elements of that type from identities (of type string written ’’s’’ in Isabelle). Actors are contained in an infrastructure graph constructed by Lgraph – here the IoT healthcare case study example.

 ex_graph $$\equiv$$  Lgraph
{(home,cloud), (sphone,cloud), (cloud,hospital)}
($$\lambda$$ x. if x = home then {’’Patient’’} else
(if x = hospital then {’’Doctor’’} else {}))
ex_creds ex_locs


This graph contains a set of location pairs representing the topology of the infrastructure as a graph of nodes and a functionthat assigns a set of actor identities to each node (location) in the graph. The last two graph components ex_creds and ex_locs are here abbreviated only (for the definitions see [16]). The function ex_creds associates actors to a pair of string sets by a pair-valued function whose first range component is a set describing the credentials in the possession of an actor and the roles the actor can take on; ex_locs defines the data residing at the component. Corresponding projection functions for each of the components of an infrastructure graph are provided; they are named gra for the actual set of pairs of locations, agra for the actor map, cgra for the credentials, and lgra for the data at that location.

Infrastructures contain an infrastructure graph and a policy given by a function that assigns local policies over a graph to all locations of the graph.

 datatype infrastructure =
Infrastructure  igraph
[igraph, location] $$\Rightarrow$$ policy set

There are projection functions graphI and delta when applied to an infrastructure return the graph and the policy, respectively. For our healthcare example, the initial infrastructure contains the above graph ex_graph and the local policies defined shortly.
 hc_scenario $$\equiv$$ Infrastructure
ex_graph local_policies

The function local_policies gives the policy for each location x over an infrastructure graph G as a pair: the first element of this pair is a function specifying the actors y that are entitled to perform the actions specified in the set which is the second element of that pair.
 local_policies G x $$\equiv$$
case x of
home $$\Rightarrow$$ {($$\lambda$$ y. True, {put,get,move,eval})}
| sphone $$\Rightarrow$$
{(($$\lambda$$ y. has G (y,’’PIN’’)), {put,get,move,eval})}
| cloud $$\Rightarrow$$ {($$\lambda$$ y. True, {put,get,move,eval})}
| hospital $$\Rightarrow$$
{(($$\lambda$$ y. ($$\exists$$ n. (n  $$@_{G}$$ hospital) $$\land$$
Actor n = y $$\land$$ has G (y, ’’skey’’))),
{put,get,move,eval})}
| _ $$\Rightarrow$$  {})

Policies specify the expected behaviour of actors of an infrastructure. They are given by pairs of predicates (conditions) and sets of (enabled) actions. They are defined by the enables predicate: an actor h is enabled to perform an action a in infrastructure I, at location l if there exists a pair (p,e) in the local policy of l (delta I l projects to the local policy) such that the action a is a member of the action set e and the policy predicate p holds for actor h.
enables I l h a $$\equiv$$ $$\exists$$ (p,e) $$\in$$ delta I l. a $$\in$$ e $$\land$$ p h


The global policy is ‘only the patient and the doctor can access the data in the cloud’:

 global_policy I a $$\equiv$$  a $$\notin$$ hc_actors
$$\longrightarrow$$ $$\neg$$(enables I cloud (Actor a) get)


Iii-C Infrastructure State Transition

The state transition relation uses the syntactic infix notation I   I’ to denote that infrastructures I and I’ are in this relation. To give an impression of this definition, we show here just one of several rules that defines the state transition for the action get because this rule will be adapted in the process of refining the system specification. Initially, this rule expresses that an actor that resides at a location l (h  l) and is enabled by the local policy in this location to “get” can change the state of that location to the string value s representing data stored in location l’.

 get˙data: G = graphI I $$\Longrightarrow$$ h $$@_{G}$$ l $$\Longrightarrow$$
l $$\in$$ nodes G $$\Longrightarrow$$ l’ $$\in$$ nodes G $$\Longrightarrow$$
enables I l’ (Actor h) get $$\Longrightarrow$$ s $$\in$$ lgra G l’ $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)
(lgra G (l := lgra G l $$\cup$$ {s})))
(delta I)
$$\Longrightarrow$$ I $$\to_{n}$$ I’

Based on this state transition and the above defined
hc_scenario, we define the first Kripke structure.
 hc_Kripke $$\equiv$$
Kripke { I. hc_scenario $$\to^{*}$$ I } {hc_scenario}


Iii-D Attack: Eve can get data

How do we find attacks? The key is to use invalidation [23] of the security property we want to achieve, here the global policy. Since we consider a predicate transformer semantics, we use sets of states to represent properties. The invalidated global policy is given by the following set shc.

 shc $$\equiv$$ {x. $$\neg$$ (global_policy x ’’Eve’’)}

The attack we are interested in is to see whether for the scenario
 hc_scenario $$\equiv$$  Infrastructure ex_graph local_policies

from the initial state Ihc {hc_scenario}, the critical state sgdpr can be reached, that is, is there a valid attack (Ihc,shc)?

For the Kripke structure

 hc_Kripke $$\equiv$$ Kripke { I. hc_scenario $$\to^{*}$$ I } Ihc

we first derive a valid and-attack using the attack tree proof calculus.
$$\vdash$$ [$${\mathcal{N}}_{\texttt{(Ihc,HC)}}$$,$${\mathcal{N}}_{\texttt{(HC,shc)}}$$]$$\oplus_{\wedge}^{\texttt{(Ihc,shc)}}$$

The set HC is an intermediate state where Eve accesses the cloud.

The attack tree calculus [14] exhibits that an attack is possible.

 hc_Kripke $$\vdash$$  EF shc

We can simply apply the Correctness theorem AT_EF to immediately prove this CTL statement. This application of the meta-theorem of Correctness of attack trees saves us proving the CTL formula tediously by exploring the state space in Isabelle proofs. Alternatively, we could use the generated code for the function is_attack_tree in Scala (see Section A-C) to check that a refined attack of the above is valid.

Iv Entering the Cycle

Iv-a First Refinement Iteration: Adding DLM Access Control

The Decentralised Label Model (DLM) [32] allows labelling data with owners and readers. We adopt it for our model. Labelled data is given by the type dlm  data where data can be any data type. We provide functions owns and readers that enable specifying when an actor may access a data item.

 has_access G l a d $$\equiv$$ owns G l a d $$\lor$$ a $$\in$$ readers d

In the first refinement of the model in the RR-Cycle, we thus use labeled data to adapt the infrastructures.

Iv-A1 Refinement Map

Isabelle allows overloading of constant names, that is, the same name can be used for different constants if these constants differ in their types or reside in different theories. We use the latter option and redefine the type infrastructure in a new theory RRLoopTwo (the original one was called RRLoopOne). Included in that redefinition of the new type is the redefinition of all involved constructors and projection functions. Note that Isabelle’s overloading permits the use of the same names. To disambiguate these equally named constructors, we can make use of the Isabelle name spaces: RRLoopOne.infrastructure and
RRLoopTwo.infrastructure allow to reference the different types of infrastructures and equally RRLoopOne.gra and RRLoopTwo.gra, for example, refer to the two igraph projection functions. The extended names including the theory name RRLoopOne or RRLoopTwo need only be used if the disambiguation is necessary (as for example below when we define the refinement map for this concrete first refinement step). However, as long as Isabelle can disambiguate the name from the context, we can use the single names, for example, infrastructure or gra.

In the refined model RRLoopTwo, the new type
infrastructure keeps now dlm  data instead of just data in the igraph.

Additionally as a preparation for defining the refinement, we need to define now a function from the new infrastructure type to the old one that projects out the data labels we have just introduced. This is visible in the last input to the Lgraph constructor where we map out the first data component snd(RRLoopTwo.lgra (graphI I) l) of a dlm  data pair for each location l. The function fmap is a ”map” function for finite sets that we defined ourselves (see also Section IV-E).

 definition refmap :: RRLoopTwo.infrastructure $$\Rightarrow$$
RRLoopOne.infrastructure
where ref_map I =
RRLoopOne.Infrastructure
(RRLoopOne.Lgraph
(RRLoopTwo.gra (graphI I))
(RRLoopTwo.agra (graphI I))
(RRLoopTwo.cgra (graphI I))
($$\lambda$$ l. fmap snd (RRLoopTwo.lgra (graphI I) l)))

In the above expression, we deliberately put the theory names RRLoopOne and RRLoopTwo for all constructors and types to enhance the understanding. In fact, this is only necessary for the type definition in the first line. For the constructors, e.g. agra, in the actual definition these can be omitted since Isabelle is capable of disambiguating them from the context.

Iv-A2 Refined State Transition

Also the state transition is now redefined for the refined theory RRLoopTwo while keeping the same name and also overloading the infix syntax . This first refinement iteration now implements access control in the labeled data type but we also need to redefine the semantics of the state transition. The refined rule get_data checks the labels for the data item stored in a location l’ and only gives access if – in addition to get being enabled for an actor h – also this actor is among the readers or is the owner. In this case, the data item including the label can be copied to the location l where h resides.

 get˙data: G = graphI I $$\Longrightarrow$$ h $$@_{G}$$ l $$\Longrightarrow$$
l $$\in$$ nodes G $$\Longrightarrow$$ l’ $$\in$$ nodes G $$\Longrightarrow$$
enables I l (Actor h) get $$\Longrightarrow$$
((Actor h’, hs), n) $$\in$$ (lgra G l’) $$\Longrightarrow$$
Actor h $$\in$$ hs $$\lor$$ h = h’ $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)
lgra G (l := lgra G l
$$\cup$$ {((Actor h’, hs), n)}))
(delta I)
$$\Longrightarrow$$ I $$\to_{n}$$ I’


Iv-A3 Proof of Refinement

We put those extension together by redefining a new Kripke structure hc_KripkeT.

 hc_KripkeT $$\equiv$$
Kripke {I. hc_scenarioT $$\to^{*}$$ I} {hc_scenarioT}

In the above, we also use the redefinition of the involved infrastructure states, for example, hc_scenarioT, where the T stands for Two. Note that these are definitions in a locale; they need to have different names since locales have a flat name space.

However, these preparation pay off since we can finally apply our refinement theory from Section II-B to prove

hc_Kripke hc_KripkeT.

Moreover, we can use in addition the meta-theory about refinement developed there: applying the theorem strong_mt’ allows to reduce this proof obligation to showing

 ref_map $$\triangleleft$$ init hc_KripkeT $$\subseteq$$ init hc_Kripke $$\land$$
($$\forall$$ s s’. ($$\exists$$ s0 $$\in$$ init K’. s0  $$\to$$ s) $$\land$$
s $$\to$$ s’ $$\longrightarrow$$ rmapT s $$\to$$ rmapT s’).


Iv-A4 Attack: Eve can change labels

We can already observe another attack: Eve can also process data using the eval action at the cloud: we can prove there is a path (EF) in the system leading to the corresponding attack state.

 hc_KripkeT $$\vdash$$
EF {I. enables I cloud (Actor ’’Eve’’) eval}

Once we have proved this CTL statement, we can use the Completeness theorems for the attack tree calculus (see Section A-C) and can thus derive that an attack exists: Eve can tamper with the access control labels by processing labeled data. We need to prove privacy preservation, i.e. that labels are preserved. As a countermeasure to this attack, the next iteration of the refinement cycle thus enforces label preserving functions.

Iv-B Second Iteration: Privacy Preservation

The labels of data must not be changed by processing. This invariant can be formalized in our Isabelle model by a type definition of functions on labeled data that preserve their labels.

typedef label_fun = {f :: dlm $$\times$$ data $$\Rightarrow$$ dlm $$\times$$ data.
$$\forall$$ x. fst x = fst (f x)}

We also define an additional function application operator on this new type. Then we can use this restricted function type to implicitly specify that only functions preserving labels may be applied in the definition of the system behaviour in the state transition rules.

This additional type definition label_fun and its accompanying operators build the core of the refined theory
RRLoopThree where we also redefine the infrastructure type and corresponding operators and projection functions.

Iv-B1 State Transition Refinement

The crucial point for this refinement to RRLoopThree is that the state transition changes to incorporate the new restrictions on label processing. The rule for eval now enforces the use of labelled functions.

The process rule

This rule prescribes how data within the infrastructure may be processed. It imposes that only privacy preserving functions may be applied to data (see Section IV-C). This is achieved by using the application operator because it enforces the variable f to be of type label_fun. The existing data item ((Actor a’, as), n) is replaced by f ((Actor a’, as), n) while preserving the label owing to the properties of type label_fun. Clearly, the actor needs to be eval enabled in his location where also the data must reside.

process: G = graphI I $$\Longrightarrow$$ h $$@_{G}$$ l $$\Longrightarrow$$
l $$\in$$ nodes G $$\Longrightarrow$$ enables I l (Actor h) eval $$\Longrightarrow$$
((Actor h’, hs), n) $$\in$$ lgra G l $$\Longrightarrow$$
Actor h $$\in$$ hs $$\lor$$ h = h’ $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)
((lgra G)(l := lgra G l  - {(y, x). x = n}
$$\cup$$ {(f::label_fun)$$\Updownarrow$$((Actor h’, hs), n)}))))
(delta I)
$$\Longrightarrow$$ I $$\to_{n}$$ I’

Processing preserves privacy

Furthermore, we can prove now that only entitled users (owners and readers) can access data: privacy is preserved by the use of label preserving functions. We can prove that processing preserves ownership for all paths globally (expressed using the CTL quantifier AG). That is, in all states of the Kripke structure and all locations of the infrastructure graph we have that the ownership in the initial state hc_scenario will persist.

theorem priv_pres: h $$\in$$ hc_actors $$\Longrightarrow$$
l $$\in$$ hc_locations $$\Longrightarrow$$
owns (Igraph hc_scenario) l (Actor h) d $$\Longrightarrow$$
hc_KripkeR $$\vdash$$  AG x. $$\forall$$ l $$\in$$ hc_locations.
owns (Igraph x) l (Actor h) d


Iv-B2 Refinement Map

as in the previous step, we define the refined Kripke structure hc_KripkeR now for the refined theory RRLoopThree and the redefined infrastructure states, here for example, hc_scenarioR as initial state of the state transition.

 hc_KripkeR $$\equiv$$
Kripke { I. hc_scenarioR $$\to^{*}$$ I } {hc_scenarioR}

Also a new datatype map ref_mapR mapping the infrastructure type of RRLoopThree to that of RRLoopTwo is defined. Note that here we only need to re-embed the constituents of the infrastructure of RRLoopThree with the corresponding constructors of RRLoopTwo within the definition. In the previous refinement map we needed to map out the first element of each dlm  data pair for each location. Now the actual structure of the labeled data is very similar but the function type changes to labeled functions. This happens automatically by the re-embedding.
definition refmapR :: RRLoopThree.infrastructure $$\Rightarrow$$
RRLoopTwo.infrastructure
where ref_mapR I = RRLoopTwo.Infrastructure
(RRLoopTwo.Lgraph
(RRLoopTwo.gra (graphI I))
(RRLoopTwo.agra (graphI I))
(RRLoopTwo.cgra (graphI I))
(RRLoopTwo.lgra (graphI I)))

Applying our refinement theory we now prove

hc_KripkeT hc_KripkeR.

It is important to note that by the additional meta-theorem prop_pres the properties proved for hc_Kripke and
hc_KripkeT carry over to hc_KripkeR.

Iv-B3 Attack: Eve can simply put data

When trying to prove a theorem to express that different occurrences of the same data in the system must have the same labels, we fail. The reason for this is the following attack.

 hc_KripkeR $$\vdash$$
EF I. enables I cloud (Actor ’’Eve’’) put

Eve could learn the data by other means than using the privacy preserving functions and using the action put to enter that data as new data to the system labelled as her own data. As a countermeasure, we need a concept to guarantee consistency across the system: blockchain.

Iv-C Third RR-Cycle Iteration: Blockchain Consistency

One major achievement of a blockchain is that it acts like a distributed ledger, that is, a global accounting book. A distributed ledger is a unique consistent transcript keeping track of protected data across a distributed system. In our application, the ledger must mainly keep track of where the data resides for any labelled data item. We formalize a ledger thus as a type of functions that maps a labelled data item to a set of locations. In this type, we further constrain each data to have at most one valid data label of type dlm. This is achieved by stating that there exists a unique (!) label l for which the location set ld(l, d) assigned to by the ledger is not empty – unless it is empty for all labels for d.

typedef ledger = { ld :: dlm $$\times$$ data $$\Rightarrow$$ location set.
$$\forall$$ d. ($$\forall$$ l. ld (l, d) = {}) $$\lor$$
($$\exists$$! l. ld (l, d) $$\neq$$ {}) }

The addition of set makes the range of the ledger a set of sets of locations which allows for none (empty set) or a number of locations to be assigned to a data item.

Iv-C1 Ledger enables Data Protecting State Transition

The set of rules for defining the state transition of infrastructures needs to be adapted to the refined model. The refinement by a ledger is incarnated into the system specification to guarantee consistency across distributed units. The state transition rules have to be adapted yet again but also the type dlm needs to be refined replacing actors by identities since otherwise the uniqueness of the label imposed in the ledger typedef cannot be proved for actors. The abstract models intentionally did not stipulate Actor to be injective to allow for insider attacks – now the ledger enforces the use of identities rather than actor “roles”.

Iv-C2 State Transition Refinement

We illustrate the changes of this refinement step again on the rule for get first. Since now the model is fairly complete, we finally also show the other rules.

The get data rule

This rule now requires that the ledger be updated by noting that the data item also resides in the new location l. This is achieved by unifying the existing set of locations L for this data item with the new location l. The existing set of locations L is simply retrieved by applying the ledger ledgra G to the data item n and its label (h’, hs). The update of the ledger at the position ledgra G ((h’, hs), n) of this data item uses the operator := to change the ledger to contain the new list of locations L  {l}.

 get˙data: G = graphI I $$\Longrightarrow$$ h $$@_{G}$$ l $$\Longrightarrow$$
l $$\in$$ nodes G $$\Longrightarrow$$ l’ $$\in$$ nodes G $$\Longrightarrow$$
enables I l’ (Actor h) get $$\Longrightarrow$$
Actor h $$\in$$ hs $$\lor$$ h = h’ $$\Longrightarrow$$
ledgra G (n, (Actor h’, hs)) = L $$\Longrightarrow$$ l’ $$\in$$ L $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)(lgra G)
(ledgra G ((h’, hs), n) := L $$\cup$$ {l})
(delta I)
$$\Longrightarrow$$ I $$\to_{n}$$ I’


The put data rule

It assumes an actor h residing at a location l in the infrastructure graph G and being enabled the put action. If infrastructure state I fulfils those preconditions, the next state I’ can be constructed from the current state by adding the data item n with label (h, hs) at location l. The addition is given by updating (using :=) the existing ledger ledgra G. The ledger is set for this labelled data item (n, (h, hs)) initially as the singleton set {l} containing just this location. Note that the first component h marks the owner of this data item as h. The other components are the reader list hs, and the actual data n.

 put: G = graphI I $$\Longrightarrow$$ h $$@_{G}$$ l $$\Longrightarrow$$
enables I l (Actor h) put $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)(lgra G)
(ledgra G ((Actor h, hs), n) := l))
(delta I)
$$\Longrightarrow$$ I $$\to_{n}$$ I’


The process rule

This rule is now simplified by use of the ledger. The update changes the ledger’s domain by re-assigning – again using update – the location set L to the new input (f ((a’, as),n)) of the ledger function ledgra G. First, the old value of the data item ((a’, as), n) is deleted by assigning it to the empty set {} to preserve the invariant of the ledger type. Note, that this semantics of process changes the data on processing consistently in all parts of the distributed system (see resulting consistency property in Section IV-C4).

 process: G = graphI I $$\Longrightarrow$$ a $$@_{G}$$ l $$\Longrightarrow$$
enables I l (Actor a) eval $$\Longrightarrow$$
a $$\in$$ as $$\lor$$ a = a’ $$\Longrightarrow$$ ledgra G ((a’, as), n) = L $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)(lgra G)
(ledgra G ((a’, as), n) := {})
(f $$\Updownarrow$$((a’, as),n)):= L)
(delta I)
$$\Longrightarrow$$ I $$\to_{n}$$ I’


The delete rule

The owner of the data may delete his or her data from all locations in the infrastructure graph. Note that, different to the previous rules, here are no preconditions on the location of the actor nor the location of the data other than that they are in the infrastructure graph. Neither is there any requested enabledness of actions imposed on the actor. That is, the owner can delete his data anywhere. Also note, how the use of the ledger simplifies the deletion of data throughout the system: it suffices to update the ledger to the empty set; automatically the data is deleted everywhere.

 del˙data: G = graphI I $$\Longrightarrow$$ a $$\in$$ actors G $$\Longrightarrow$$
l $$\in$$ nodes G $$\Longrightarrow$$ l $$\in$$ L $$\Longrightarrow$$
ledgra G ((a’, as), n) = L $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)(lgra G)
(ledgra G ((a’, as), n) := {}))
$$\Longrightarrow$$ I $$\to_{n}$$ I’


The move rule

This rule completes the set of inductive rule defining the semantics of the state transition relation . This inductive rule states that if an actor h resides in a location l of the infrastructure graph G and a target’s location l’ local policy entitles this actor to the move action, then the infrastructure I can transit into the infrastructure I’ where I’ is defined by an auxiliary function move_graph (omitted here, for details see [16])

 move: G = graphI I $$\Longrightarrow$$
h $$\in$$ actors_graph(graphI I) $$\Longrightarrow$$
h’ $$\in$$ actors_graph(graphI I) $$\Longrightarrow$$ l $$\in$$ nodes G $$\Longrightarrow$$
l’ $$\in$$ nodes G $$\Longrightarrow$$ enables I l’ (Actor h) move $$\Longrightarrow$$
I’ = Infrastructure
(move_graph_a a l l’ (graphI I))
(delta I)
$$\Longrightarrow$$ I $$\to$$ I’


Iv-C3 Refinement Map

In the extended infrastructure of the refined system the infrastructure graph needs to be extended by the ledger. The resulting infrastructure in the refined theory RRLoopFour thus contains a ledger. So, the refinement map needs to transform the ledger in the infrastructure graph into a map from locations to sets of labeled data.

 definition refmapF :: RRLoopFour.infrastructure $$\Rightarrow$$
RRLoopThree.infrastructure
where
ref_mapF I = RRLoopThree.Infrastructure
(RRLoopThree.Lgraph
(RRLoopThree.gra (graphI I))
(RRLoopThree.agra (graphI I))
(RRLoopThree.cgra (graphI I))
(ledger_to_loc (ledgra (graphI I)))

The projection ledgra just maps out the ledger but the auxiliary function ledger_to_loc performs the main data type transformation defined by the following functions.
 dlm_to_dlm $$\equiv$$ ($$\lambda$$ ((s :: string), (sl :: string set)).
(Actor s, fmap Actor sl))
data_trans  $$\equiv$$
($$\lambda$$ (l :: (string $$\times$$ string set),d :: string).
(dlm_to_dlm l, d))
ledger_to_loc ld l $$\equiv$$
if l $$\in$$  U range(Rep_ledger ld)
then fmap data_trans {dl. l $$\in$$ (ld dl)} else {}

The function Rep_ledger is the injection from elements of the ledger type into the set defining the type. It is automatically created by Isabelle from the type definition.

To make the refinement proofs feasible, it is necessary to provide a set of rather technical lemmas to support the use of this transformation within the refinement map. For details see [16]. A central lemma is clearly the uniqueness of the data given by the labels.

 lemma ledger_to_loc_data_unique:
Rep_ledger ld (dl,d) $$\neq$$ {} $$\Longrightarrow$$
Rep_ledger ld (dl’,d) $$\neq$$ {} $$\Longrightarrow$$ dl = dl’

Central as well is a transformation lemma.
 lemma ledgra_ledger_to_loc:
finite{dl::(char list$$\times$$char list set)$$\times$$char list.
l $$\in$$ Rep_ledger (ledgra G) dl} $$\Longrightarrow$$
l $$\in$$ (ledgra G ((a, as), n)) $$\Longrightarrow$$
((Actor a, fmap Actor as), n) $$\in$$
ledger_to_loc(ledgra G) l

As before we show hc_KripkeR hc_KripkeF for the corresponding models. The main change between those infrastructure models is due to the use of the ledger. It is visible in the infrastructure graph where an additional component , here ex_ledger appears.
ex_graph $$\equiv$$ Lgraph
{(home, cloud), (sphone, cloud), (cloud,hospital)}
($$\lambda$$ x. if x = home then {’’Patient’’} else
(if x = hospital then {’’Doctor’’} else {}))
ex_creds ex_locs ex_ledger

This parameter ex_ledger specifies in our running example that the data ”42”, for example, some bio marker’s value, is owned by the patient and can be read by the doctor and is currently only contained in location cloud.
ex_ledger $$\equiv$$ ($$\lambda$$ (l, d).
if d = ’’42’’ $$\land$$ l = (’’Patient’’,{’’Doctor’’})
then {cloud} else {})


Iv-C4 Ledger Guarantees Consistent Data Ownership

We can now prove that data protection is consistent across the infrastructure. If in any two locations the same data item n resides, then the labeling must be the same. That is, the owner and set of readers are identical.

theorem Ledger_con: h $$\in$$ hc_actors $$\Longrightarrow$$
h’ $$\in$$ hc_actors $$\Longrightarrow$$
l $$\in$$ hc_locations $$\Longrightarrow$$ l’ $$\in$$ hc_locations $$\Longrightarrow$$
l $$\in$$ ledgra G ((h, hs), n) $$\Longrightarrow$$
l’ $$\in$$ ledgra G ((h’, hs’), n) $$\Longrightarrow$$
(h, hs) = (h’, hs’)

This property immediately follows from the invariant property of the type definition of the type ledger (see Section IV-C) and privacy preservation given by the label function type (see Section IV-A). This means that the corresponding interactive proofs that we have to provide to Isabelle are straightforward and largely supported by its automated tactics (see the Isabelle source code for details).

Iv-D Attack and Fourth RR-Cycle: Eve can overwrite blockchain

Despite the above proved theorem, there is yet another aspect – as usual outside the model – that leads to an attack. In the abstract specification of a ledger, we have omitted the implementation of a blockchain. We could have a centrally controlled blockchain in which one part signs the entire blockchain to guarantee consistency. Eve could be an insider impersonating the blockchain controller. In that case, she could just overwrite the entry made by Bob and add his data as her own. Formally, we can re-use the put attack of the previous level using the rule put above to overwrite Bob’s entry by Eve’s.

As a refinement for the RR-Cycle, we need to consider a consensus algorithm, like Nakamoto’s used in Bitcoin, between the participants in the distributed system to chose a different leader for each blockchain commitment to avoid the attack. Adding a refinement with a Nakamoto consensus to our model is possible but rather complex. However, we can simply specify the effect of this refinement in the system specification by adding

$$\forall$$ a as. ledgra G ((Actor a, as), n) = {}

as a precondition to the rule put, that is, the data item must not yet be assigned to anyone in the ledger in order to allow a put action.

Iv-E Evaluation and Detecting Design Errors

To give a a rough estimate of the formalisation and proof effort of the application of the Refinement-Risk cycle to the IoT healthcare application provided in this paper: each of the 8 files (four pairs of files: one for the semantics and the other for the example infrastructure) has between 200 and 800 lines of Isabelle code: definitions and mostly proof script lines.

Clearly, an important motivation for going through this rather tedious process of formally refining a system specification in this framework is the property preservation that we have established as a meta-theorem on the refinement in Section II-B. It allows us to preserve once gained security and privacy properties and increasingly make the specification more secure.

The effort to do the refinement proofs is rather high: the proofs of refinement in each level are up to 400 lines of Isabelle code and sometimes necessitated proving additional lemmas about the new operators, for example, for label preserving functions and the ledger type.

As mentioned in the introduction and repeatedly throughout the paper, one main advantage of the formal security refinement approach presented in this paper is that it filters out errors that are easily made in the stepwise design. We have found and corrected a number of small errors, like inconsistencies of the premises in the rules for the state transition at the four different levels of model abstraction. For example, the patient data was positioned in the cloud in RRLoopThree and at home in RRLoopFour. The formal refinement with forces out these errors immediately. The simple ones, like the former example, are easy to fix but others require more work to understand them, find a solution and provide the necessary lemmas to then prove the refinement. The more subtle ones are sometimes harder to fix like the following example shows.

When we introduce the DLM labels in the first iteration, the corresponding refinement map is based on the function refmap mapping the refined data type to the abstract data type. In this first map, we need to eliminate the data label. So, we simply apply the function snd to all data items of type dlm data to map to type data; formally applying fmap snd (see Section IV-A1).

In the earlier version of the IoT case study [17], the rule for delete uses set difference - to delete the labeled data item in RRLoopTwo.

del˙data: G = graphI I $$\Longrightarrow$$ h $$\in$$ actors_graph  G $$\Longrightarrow$$
l $$\in$$ nodes G $$\Longrightarrow$$ ((Actor h, hs), n) $$\in$$ lgra G l $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)(lgra G)
((lgra G)(l := (lgra G l) -
{((Actor h, hs), n)}))
(delta I)
$$\Longrightarrow$$ I $$\to_{n}$$ I’

The subtle design error manifests itself when we try to prove that the semantics using the above rule is a refinement of the abstract model hcKripkeOne. This semantics still allows the data item n to occur with two different labels say, (Actor h, hs)  (Actor h’, hs’) (pre-ledger model). We may have two similar traces where deletion appears once on ((Actor h, hs), n) and once on ((Actor h’, hs’), n). The refinement map maps both traces to one trace in the abstract model hcKripkeOne. In the abstract trace, after the deletion, the state does not contain the data item n any more, while in both refined traces one copy of the data item (with mutually different labels) prevails. Both are insecure states that must not implement the abstract specification: a users data that is believed to be eradicated is still in the data base potentially with another label of an attacker. So, for privacy enforcement it is absolutely crucial to avoid such design errors.

This design error can be eradicated by making sure that a deletion operation actually deletes all copies of the data item.

del˙data’: G = graphI I $$\Longrightarrow$$ h $$\in$$ actors_graph  G $$\Longrightarrow$$
l $$\in$$ nodes G $$\Longrightarrow$$ ((Actor h, hs), n) $$\in$$ lgra G l $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)(lgra G)
((lgra G)(l := (lgra G l) -
{(y, x). x = n }))
(delta I)
$$\Longrightarrow$$ I $$\to_{n}$$ I’

The same problem occurs in the process rule but the same solution applies (Section IV-B1 shows the fixed rule). Once this solution has been found, we need to prove that this fixed semantics preserves traces now. A core lemma we need to prove to this end is the following.
lemma fmap_lem_del_set: finite S $$\Longrightarrow$$
$$\forall$$ n $$\in$$ S.
fmap f (S - {y. f y = f n}) = (fmap f S) - {f n}


V Conclusion and Related Work

In this paper, we have presented a formal integrated framework for a Refinement-Risk-Cycle that interleaves formal system specification with attack tree analysis by a refinement based on refinement. Thereby, formally proved engineering of the security of a system becomes possible. The method is particularly useful for IoT systems since it allows modeling physical as well as logical realities. We have illustrated this process on an IoT healthcare example running four iterations adding access control, privacy preservation, and a ledger for global consistency. Framework and casestudy are fully formalised and proved in Isabelle.

Formal system specification refinement has been investigated for some time initially for system refinement in the specification language Z [11] but a dedicated security refinement has not been formalised for some time [31]. The idea to refine a system specification for security has been already addressed in B [5, 35]. The former combines the refinement of B with system security policies given in Organisation based Access Control (OrBAC) and presents a generic example of a system development. While B is supported by its own tool Atelier B, it does not provide a formalisation in a theorem prover unlike our integration which supports dedicated security concepts like attack trees and enables useful meta-theory over the integration. The paper [35] looks at attacks within the B framework but it aims at designing a monitor that catches actions forbidden by the policy not on using these attacks to refine the system specification. Dynamic risk assessment using attack formalism, like attack graphs, has recently found great attention, e.g. [9]. However, usually, the focus of the process lies on attack generation and response planning while we address the design of secure systems. Rather than incident response, we intend to use early analysis of system specification to provide a development of secure systems. This includes physical infrastructure, like IoT system architecture, as well as organisational policies with actors.

While the additional consideration of structural refinement in our process of security refinement greatly generalises the classical concepts of trace refinement, the latter has been designed for safety properties. These are properties that hold along execution paths of a system. This is known to be insufficient in general for security properties: a security property often has to do with implicit information flows that may lead to an attacker learning some confidential information by observing various runs of a system over time thereby noticing differences in the outcomes. McLean has already shown in his seminal paper [30] that for these kind of implicit information flow properties, it is necessary to consider a security property as a set of set of traces rather than a set of traces, leading on to notions like noninterference that have been formalised in Isabelle, e.g., [33]. However, we argue that our way of modeling systems and their execution using a layered model of CTL and Kripke structures underneath the actual infrastructure model including actors allows a more fine-grained view. As we see in the example application, the explicit modeling of actors allows reasoning about specific attackers as actors. Thereby, rather than trying to establish security properties at the very basis of state based system modeling, we propose to consider the analysis of information flows and their observability by certain actors at the level of modeling the actual infrastructure. When actors are actually part of the model, it is more natural to add notions of implicit information flow and then postulating the reachability of security critical states – in which an actor has learned some confidential information – as a state. A trace based safety analysis of whether those states are reachable then results in the same analysis as a classical noninterference analysis – our layered model just clarifies the boundaries at a finer scale.

The use of a distributed ledger, also known as a blockchain, is new for formal system specification and verification. There are currently many attempts to formalize blockchains but most of them are very close to technical implementations, e.g. [27], thus obliterating the possibility to provide clear specification of legal requirements as is possible in the Isabelle Infrastructure framework and has been illustrated on GDPR requirements [15]. Moreover, to our knowledge, none of these formal models has been produced in Isabelle or similar Higher Order Logic tools until very recently [29], where Marmsoler addresses the definition of an Isabelle framework for the verification of dynamic system architectures for blockchain. Our formalization uses a generic notion of a ledger that may simply control consistency in the distributed application. This is the way forward because it enables the minimal expression of the crucial properties of a ledger. This minimal expression may not only be used as a basis for conformance proofs of more refined technical models of a ledger, like a blockchain, but also provides the crucial invariant properties for a ledger. We have used label preserving functions in our infrastructure model. They guarantee part of the data protection consistency when processing data. An interesting next step is to model smart contracts. We believe this to be a particularly rewarding extension because our model permits the expression of locality and policy based behaviour rules which naturally lends itself to allow modeling dependant action sequences using logical preconditions.

Research on risk assessment and attack trees has recently increased using formal approaches including verification, e.g., [1, 2, 4] but not in Higher Order Logic. With respect to system development the focus is often on the generation of the attack tree not the system, e.g. [36]. In [3], the authors build a foundation for system based attack trees but do not mechanise it in a theorem prover. Model transformation for attack trees has been addressed in [28] as a practical tool to translate between different frameworks but not to reason about refinement. Data refinement has been addressed in Isabelle [10] but not with respect to security engineering.

The relationship between Higher Order logic and Modelchecking has been first explored by Kobayashi (see [26] for a paper subsuming previous results). Modelchecking has been realized as well in Isabelle [8] but we use the different formalisation of CTL [13]. Developing secure systems using Isabelle has been done using the formalisation of noninterference to develop an online conference system [25].

The novelty of our approach is to integrate formally refinement with the risk assessment by attack trees into a constructive security refinement process. Abstract system specifications can be provably refined and finally code can be extracted to major programming languages, e.g. Scala.

This material is based upon work supported by the ERA-NET CHIST-ERAhttp://dx.doi.org/10.13039/100000001 under Grant No. 102112. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the European Union.

References

• [1] Z. Aslanyan, F. Nielson, and D. Parker. Quantitative verification and synthesis of attack-defence scenarios. In 29th IEEE Computer Security Foundations Symposium, CSF’16, 2016.
• [2] M. Audinot, S. Pinchinat, and B. Kordy. Is my attack tree correct? In 22nd European Symposium on Research in Computer Security, ESORICS’2017, volume 10492 of LNCS, pages 83–102. Springer, 2017.
• [3] M. Audinot, S. Pinchinat, and B. Kordy. Guided design of attack trees: A system-based approach. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018, pages 61–75. IEEE Computer Society, 2018.
• [4] D. Beaulaton, I. Cristescu, A. Legay, and J. Quilbeuf. A modeling language for security threats of iot systems. In F. Howar and J. Barnat, editors, Formal Methods for Industrial Critical Systems, pages 258–268, Cham, 2018. Springer International Publishing.
• [5] N. Benaïssa, D. Cansell, and D. Méry. Integration of security policy into system modeling. In J. Julliand and O. Kouchnarenko, editors, B 2007: Formal Specification and Development in B, pages 232–247, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
• [6] D. M. Cappelli, A. P. Moore, and R. F. Trzeciak. The CERT Guide to Insider Threats: How to Prevent, Detect, and Respond to Information Technology Crimes (Theft, Sabotage, Fraud). SEI Series in Software Engineering. Addison-Wesley Professional, 1 edition, Feb. 2012.
• [7] CHIST-ERA. Success: Secure accessibility for the internet of things, 2016.
• [8] J. Esparza, P. Lammich, R. Neumann, T. Nipkow, A. Schimpf, and J. Smaus. A fully verified executable LTL model checker. In N. Sharygina and H. Veith, editors, Computer Aided Verification - 25th International Conference, CAV 2013, Saint Petersburg, Russia, July 13-19, 2013. Proceedings, volume 8044 of Lecture Notes in Computer Science, pages 463–478. Springer, 2013.
• [9] G. Gonzalez-Granadillo, S. Dubus, A. Motzek, J. Garcia-Alfaro, E. Alvarez, M. Merialdo, S. Papillon, and H. Debar. Dynamic risk management response system to handle cyber threats. Future Generation Computer Systems, 83:535–552, 2018.
• [10] F. Haftmann, A. Krauss, O. Kuncar, and T. Nipkow. Data refinement in isabelle/hol. In S. Blazy, C. Paulin-Mohring, and D. Pichardie, editors, Interactive Theorem Proving - 4th International Conference, ITP 2013, Rennes, France, July 22-26, 2013. Proceedings, volume 7998 of Lecture Notes in Computer Science, pages 100–115. Springer, 2013.
• [11] J. He, C. A. R. Hoare, and J. W. Sanders. Data refinement refined. In B. Robinet and R. Wilhelm, editors, ESOP, volume 213 of Lecture Notes in Computer Science, pages 187–196. Springer, 1986.
• [12] J. Jacob. On the derivation of secure components. In IEEE Security and Privacy, pages 242–247. IEEE, 1989.
• [13] F. Kammüller. Isabelle modelchecking for insider threats. In Data Privacy Management, DPM’16, 11th Int. Workshop, volume 9963 of LNCS. Springer, 2016. Co-located with ESORICS’16.
• [14] F. Kammüller. Attack trees in isabelle. In 20th International Conference on Information and Communications Security, ICICS2018, volume 11149 of LNCS. Springer, 2018.
• [15] F. Kammüller. Formal modeling and analysis of data protection for gdpr compliance of iot healthcare systems. In IEEE Systems, Man and Cybernetics, SMC2018. IEEE, 2018.
• [16] F. Kammüller. Isabelle infrastructure framework with iot healthcare s&p application, 2018. Available at https://github.com/flokam/IsabelleAT.
• [17] F. Kammüller. Combining secure system design with risk assessment for iot healthcare systems. In Workshop on Security, Privacy, and Trust in the IoT, SPTIoT’19, colocated with IEEE PerCom. IEEE, 2019.
• [18] F. Kammüller. Isabelle infrastructure framework and rr-cycle with iot healthcare s&p application, 2019. Available at https://github.com/flokam/IsabelleAT.
• [19] F. Kammüller and M. Kerber. Investigating airplane safety and security against insider threats using logical modeling. In IEEE Security and Privacy Workshops, Workshop on Research in Insider Threats, WRIT’16. IEEE, 2016.
• [20] F. Kammüller, M. Kerber, and C. Probst. Towards formal analysis of insider threats for auctions. In 8th ACM CCS International Workshop on Managing Insider Security Threats, MIST’16. ACM, 2016.
• [21] F. Kammüller, J. R. C. Nurse, and C. W. Probst. Attack tree analysis for insider threats on the IoT using Isabelle. In Human Aspects of Information Security, Privacy, and Trust - Fourth International Conference, HAS 2015, Held as Part of HCI International 2016, Toronto, Lecture Notes in Computer Science. Springer, 2016. Invited paper.
• [22] F. Kammüller, O. O. Ogunyanwo, and C. W. Probst. Using fusion/uml for iot architecures for healthcare applications. arXiv, https://arxiv.org/abs/1901.02426, 2018.
• [23] F. Kammüller and C. W. Probst. Combining generated data models with formal invalidation for insider threat analysis. In IEEE Security and Privacy Workshops (SPW). IEEE, 2014.
• [24] F. Kammüller and C. W. Probst. Modeling and verification of insider threats using logical analysis. IEEE Systems Journal, Special issue on Insider Threats to Information Security, Digital Espionage, and Counter Intelligence, 11(2):534–545, 2017.
• [25] S. Kanav, P. Lammich, and A. Popescu. A conference management system with verified document confidentiality. In A. Biere and R. Bloem, editors, Computer Aided Verification, pages 167–183, Cham, 2014. Springer International Publishing.
• [26] N. Kobayashi. Model checking higher-order programs. J. ACM, 60(3):20:1–20:62, 2013.
• [27] A. Kosba, A. Miller, E. Shi, Z. Wen, and C. Papamanthou. Hawk: The blockchain model of cryptography and privacy-preserving smart contracts. In IEEE Symposium on Security and Privacy, pages 839–858. IEEE, 2016.
• [28] R. Kumar, S. Schivo, E. Ruijters, B. M. Yildiz, D. Huistra, J. Brandt, A. Rensink, and M. Stoelinga. Effective analysis of attack trees: A model-driven approach. In A. Russo and A. Schürr, editors, Fundamental Approaches to Software Engineering, 21st International Conference, FASE 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings., volume 10802 of Lecture Notes in Computer Science, pages 56–73. Springer, 2018.
• [29] D. Marmsoler. Towards Verified Blockchain Architectures: A Case Study on Interactive Architecture Verification, pages 204–223. 05 2019.
• [30] J. McLean. A general theory of composition for trace sets closed under selective interleaving functions. In In Proc. IEEE Symposium on Security and Privacy, pages 79–93, 1994.
• [31] C. Morgan. The shadow knows: Refinement and security in sequential programs. Sci. Comput. Program., 74(8):629–653, 2009.
• [32] A. C. Myers and B. Liskov. Complete, safe information flow with decentralized labels. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE, 1999.
• [33] A. Popescu, J. Hölzl, and T. Nipkow. Formalizing probabilistic noninterference. In G. Gonthier and M. Norrish, editors, Certified Programs and Proofs - Third International Conference, CPP 2013, Melbourne, VIC, Australia, December 11-13, 2013, Proceedings, volume 8307 of Lecture Notes in Computer Science, pages 259–275. Springer, 2013.
• [34] B. Schneier. Secrets and Lies: Digital Security in a Networked World. John Wiley & Sons, 2004.
• [35] N. Stouls and M.-L. Potet. Security policy enforcement through refinement process. In J. Julliand and O. Kouchnarenko, editors, B 2007: Formal Specification and Development in B, pages 216–231, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
• [36] R. Vigo, F. Nielsen, and H. R. Nielsen. Automated generation of attack trees. In 27th Computer Security Foundations Symposium, CSF’14. IEEE, 2014.

Appendix A Background

This section provides an overview of the current extension of the Isabelle Infrastructure framework in relation to previous works and how it integrates the Refinement-Risk cycle (Section A-A). It also summarizes the formalization of the existing theories for Kripke structures and the temporal logic CTL (Section A-B), as well as the attack tree formalisation and Correctness and Completeness theorems (Section A-C). Finally, Section A-D presents the IoT Healthcare system – the case study on which the Refinement-Risk cycle is validated in this paper.

A-a Isabelle Infrastructure Framework

Isabelle is a generic Higher Order Logic (HOL) proof assistant. Its generic aspect allows the embedding of so-called object-logics as new theories on top of HOL. There are sophisticated proof tactics available to support reasoning: simplification, first-order resolution, and special macros to support arithmetic amongst others. Object-logics are added to Isabelle using constant and type definitions forming a so-called conservative extension. That is, no inconsistency can be introduced: new types are defined as subsets of existing types; properties are proved using a one-to-one relationship to the new type from properties of the existing type. The use of HOL has the advantage that it enables expressing even the most complex application scenarios, conditions, and logical requirements. Isabelle enables the analysis of meta-theory, that is, we can prove theorems in an object logic but also about it.

This allows the building of telescope-like structures in which a meta-theory at a lower level embeds a more concrete “application” at a higher level. Properties are proved at each level. Interactive proof is used to prove these properties but the meta-theory can be applied to immediately produce results. Figure 1 in Section I-A gives an overview of the Isabelle Infrastructure framework with its layers of object-logics – each level below embeds the one above.

The Isabelle Infrastructure framework has been created initially for the modeling and analysis of Insider threats [24]. Its use has been validated on the most well-known insider threat patterns identified by the CERT-Guide to Insider threats [6]. More recently, this Isabelle framework has been successfully applied to realistic case studies of insider attacks in airplane safety [19] and on auction protocols [20]. These larger case studies as well as complementary work on the analysis of Insider attacks on IoT infrastructures, e.g. [21], have motivated the extension of the original framework by Kripke structures and temporal logic [13] as well as a formalisation of attack trees [14]. Recently, GDPR compliance verification has been demonstrated [15].

A-B Kripke Structures and CTL

Kripke structures and CTL model state based systems and enable analysis of properties under dynamic state changes. A state transition relates snapshots of systems which are the states. The temporal logic CTL then enables expressing security and privacy properties.

In Isabelle, the system states and their transition relation are defined as a class called state containing an abstract constant state_transition. It introduces the syntactic infix notation I   I’ to denote that system state I and I’ are in this relation over an arbitrary (polymorphic) type . The operator :: is a type judgement to coerce the type variable into the class type. The arrow is the operator for functions on types and bool is the HOL inbuilt type of truth values true and false.

 class state =
fixes state_transition :: ($$\sigma$$ :: type) $$\Rightarrow$$ $$\sigma$$ $$\Rightarrow$$ bool
("_  $$\to$$ _")

The above class definition lifts Kripke structures and CTL to a general level. The definition of the inductive relation is given by a set of specific rules which are, however, part of an application like infrastructures (Section III-B). Branching time temporal logic CTL is defined in general over Kripke structures with arbitrary state transitions and can later be applied to suitable theories, like infrastructures.

Based on the generic state transition of the type class state, the CTL-operators EX and AX express that property holds in some or all next states, respectively. The CTL formula AG means that on all paths branching from a state the formula is always true (G stands for ‘globally’). It can be defined using the Tarski fixpoint theory by applying the greatest fixpoint operator. In a similar way, the other CTL operators are defined. The formal Isabelle definition of what it means that formula holds in a Kripke structure M can be stated as: the initial states of the Kripke structure init M need to be contained in the set of all states states M that imply .

 M $$\vdash$$ f $$\equiv$$  init M $$\subseteq$$ { s $$\in$$ states M. s $$\in$$ f }

In an application, the set of states of the Kripke structure is defined as the set of states reachable by the infrastructure state transition from some initial state, say ex_scenario.
  ex_states $$\equiv$$ { I. ex_scenario $$\to^{*}$$  I }

The relation is the reflexive transitive closure – an operator supplied by the Isabelle theory library – applied to the relation .

The Kripke constructor combines the constituents initial state and state set.

 ex_Kripke $$\equiv$$ Kripke ex_states {ex_scenario}

In Isabelle, the concept of sets and predicates coincide (more precisely they are isomorphic) 222In general, this is often referred to as predicate transformer semantics.. Thus a property is a predicate over states which is equal to a set of states. For example, we can then try to prove that there is a path (E) to a state in which the property eventually holds (in the Future) by starting the following proof in Isabelle.
 ex_Kripke $$\vdash$$  EF property

Since property is a set of states, and the temporal operators are predicate transformers, that is, transform sets of states to sets of states, the resulting EF property is also a set of states – and hence again a property.

A-C Attack Trees in Isabelle

Attack trees [34] are a graphical language for the analysis and quantification of attacks. If the root represents an attack, its children represent the sub-attacks. Leaf nodes are the basic attacks; other nodes of attack trees represent sub-attacks. Sub-attacks can be alternatives for reaching the goal (disjunctive node) or they must all be completed to reach the goal (conjunctive node). Figure 5 is an example of an attack tree taken from a textbook [34] illustrating the attack of opening a safe.

Nodes can be adorned with attributes, for example costs of attacks or probabilities which allows quantification of attacks (not used in the example).

The following datatype definition attree defines attack trees. Isabelle allows recursive datatype definitions similar to the programming languages Haskell or ML. A datatype is given by a “|” separated sequence of possible cases each of which consists of a constructor name, the types of inputs to this constructor, and optionally a pretty printing syntax definition. The simplest case of an attack tree is a base attack. The principal idea is that base attacks are defined by a pair of state sets representing the initial states and the attack property – a set of states characterized by the fact that this property holds for them. Attacks can also be combined as the conjunction or disjunction of other attacks. The operator creates or-trees and creates and-trees. And-attack trees and or-attack trees consist of a list of sub-attacks – again attack trees.

 datatype ($$\sigma$$ :: state)attree =
BaseAttack ($$\sigma$$ set)$$\times$$($$\sigma$$ set) ("$${\mathcal{N}}_{\texttt{ }}$$(_)")
| AndAttack ($$\sigma$$ attree)list ($$\sigma$$ set)$$\times$$($$\sigma$$ set) ("_ $$\oplus_{\wedge}^{(\_)}$$")
| OrAttack  ($$\sigma$$ attree)list ($$\sigma$$ set)$$\times$$($$\sigma$$ set) ("_ $$\oplus_{\vee}^{(\_)}$$")

The attack goal is given by the pair of state sets on the right of the operator , or , respectively. A corresponding projection operator is defined as the function attack.

When we develop an attack tree, we proceed from an abstract attack, given by an attack goal, by breaking it down into a series of sub-attacks. This proceeding corresponds to a process of refinement. The attack tree calculus [14] provides a notion of attack tree refinement elegantly expressed as the infix operator . Note that this refinement is different from the notion of system refinement that will be presented later in this paper. The intuition of developing an attack tree by refinement from the root to the leaves is illustrated in Figure 6 (the formal definition is in [14]). The example attack tree on the left side has a leaf that is expanded by the refinement into an and-attack with two steps.

Refinement of attack trees defines the stepwise process of expanding abstract attacks into more elaborate attacks only syntactically. There is no guarantee that the refined attack is possible if the abstract one is, nor vice-versa. The attack tree calculus [14] formalizes the semantics of attack trees on Kripke structures and CTL enabling rigorous judgement whether such syntactic refinements represent possible attacks.

A valid attack, intuitively, is one which is fully refined into fine-grained attacks that are feasible in a model. The general model provided is a Kripke structure, i.e., a set of states and a generic state transition. Thus, feasible steps in the model are single steps of the state transition. They are called valid base attacks. The composition of sequences of valid base attacks into and-attacks yields again valid attacks if the base attacks line up with respect to the states in the state transition. If there are different valid attacks for the same attack goal starting from the same initial state set, these can be summarized in an or-attack. The formal definition [14] is given in the table in Figure 7.

The semantics of attack trees is described by this one recursive function. Since the definition can be given as a recursive function, Isabelle code generation is applicable: an executable decision procedure for attack tree validity can be automatically generated in various programming languages, for example, Scala.

Adequacy of the semantics is proved in [14] by proving correctness and completeness. The following correctness theorem shows that if A is a valid attack on property s starting from initial states described by I, then from all states in I there is a path to the set of states fulfilling s in the corresponding Kripke structure.

 theorem AT_EF: $$\vdash$$ A :: ($$\sigma$$ :: state) attree) $$\Longrightarrow$$
(I, s) = attack A $$\Longrightarrow$$
Kripke {t . $$\exists$$ i $$\in$$ I. i $$\to$$^* t} I $$\vdash$$  EF s

The inverse direction of theorem AT_EF is a completeness theorem: if states described by predicate s can be reached from a finite nonempty set of initial states I in a Kripke structure, then there exists a valid attack tree for the attack (I,s).
 theorem Completeness: I $$\neq$$ {} $$\Longrightarrow$$ finite I $$\Longrightarrow$$
Kripke {t . $$\exists$$ i $$\in$$ I. i $$\to$$^* t} I $$\vdash$$  EF s $$\Longrightarrow$$
$$\exists$$ A :: ($$\sigma$$::state)attree. $$\vdash$$ A $$\land$$ (I, s) = attack A

Correctness and Completeness are proved in Isabelle [14, 16]. They are not just necessary proofs on the attack tree semantics but the theorems allow easy transformation of properties between the embedded notions of attack tree validity and CTL formulas like EF. The relationship between these notions can be applied to case studies. That is, if we apply attack tree refinement to spell out an abstract attack tree for attack s into a valid attack sequence, we can apply theorem AT_EF and can immediately infer that EF s holds. Vice versa, the theorem Completeness can be applied to directly infer the existence of an attack tree from the former.

A-D Edge Computing: IoT Healthcare System

The example of an IoT healthcare systems is from the CHIST-ERA project SUCCESS [7] on monitoring Alzheimer’s patients. Figure 8 illustrates the system architecture where data collected by sensors in the home or via a smartphone helps monitoring bio markers of the patient. The data collection is in a cloud based server to enable hospitals (or scientific institutions) to access the data which is controlled via the smartphone.

It is a typical edge network application: the smartphone and the sensor hub in the home are typical edge devices that are capable of doing processing data without uploading to the cloud server.