# Dependability Engineering in Isabelle

In this paper, we introduce a process of formal system development supported by interactive theorem proving in a dedicated Isabelle framework. This Isabelle Infrastructure framework implements specification and verification in a cyclic process supported by attack tree analysis closely inter-connected with formal refinement of the specification. The process is cyclic: in a repeated iteration the refinement adds more detail to the system specification. It is a known hard problem how to find the next refinement step: this problem is addressed by the attack based analysis using Kripke structures and CTL logic. We call this cyclic process the Refinement-Risk cycle (RR-cycle). It has been developed for security and privacy of IoT healthcare systems initially but is more generally applicable for safety as well, that is, dependability in general. In this paper, we present the extensions to the Isabelle Infrastructure framework implementing a formal notion of property preserving refinement interleaved with attack tree analysis for the RR-cycle. The process is illustrated on the specification development and privacy analysis of the mobile Corona-virus warning app.

## Authors

• 9 publications
01/04/2020

### A Formal Development Cycle for Security Engineering in Isabelle

In this paper, we show a security engineering process based on a formal ...
12/29/2021

### Explanation by Automated Reasoning Using the Isabelle Infrastructure Framework

In this paper, we propose the use of interactive theorem proving for exp...
10/20/2018

### Proceedings 18th Refinement Workshop

Refinement is one of the cornerstones of a formal approach to software e...
02/11/2021

### Validation Obligations: A Novel Approach to Check Compliance between Requirements and their Formal Specification

Traditionally, practitioners use formal methods pre-dominately for one h...
03/17/2018

### Attack Trees in Isabelle

In this paper, we present a proof theory for attack trees. Attack trees ...
03/27/2018

### A Formal TLS Handshake Model in LNT

Testing of network services represents one of the biggest challenges in ...
01/23/2020

### SAT-hard Cyclic Logic Obfuscation for Protecting the IP in the Manufacturing Supply Chain

State-of-the-art attacks against cyclic logic obfuscation use satisfiabi...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

### I-a Motivation

Dependability [1] is a notoriously difficult property for system development because already security – which is part of dependability – is not compositional: given secure components, a system created from those components is not necessarily secure. Therefore, the usual divide-and-conquer approach from system and software design does not apply for dependability engineering. At the same time, it is mandatory for the design of dependable systems to introduce security in the early phases of the development since it cannot be easily “plugged in” at later stages. However, even if security is introduced in early phases, a classical stepwise development of refining abstract specifications by concretizing the design does not preserve security properties. Take for example, the implementation of sending a message from a client A to a server B such that the communication is encrypted to protect its content. In the abstract system specification, we do not consider a concrete protocol nor the architecture of the client and server. Using common refinement methods from software engineering provides a possible implementation by passing the message from a client system AS connected by a secure channel to a server system BS. However, this implementation does not exclude that other processes running on either AS or BS can eavesdrop on the plain text message because the confidentiality protection is only on the secure channel from AS to BS. This example is a simple illustration of what is known as the security refinement paradox [2].

Why is security so hard? A simple explanation is that it talks about negative properties: something (loss or damage of information or functionality) must not happen. Negation is also in logic a difficult problem as it needs exclusion of possibilities. If the space to consider is large, this proof can be hard or infeasible. In security, the attacks often come from “outside the model”. That is, for a given specification we can prove some security property and yet an attack occurs which uses a fact or observation or loophole that just has not been considered in the model. This known practical attack problem is similar to the refinement paradox. Intuitively, the attacker exploits a refinement of the system that has not been taken into account in the specification but is actually part of the real system (an implementation of the specification). In the above example, the real system allows that other applications can be run on the client within the security boundary. This additional feature of multi-processing systems has not been taken into account in the abstract specification where we considered processes and systems - the client and the server - as abstract entities without considering crucial features of their internal architectures.

To facilitate the formal stepwise development of dependable systems, we propose to make use of these intricacies by including the attacks into the process. As in common engineering practice, where testing for failures is used for improvement of prototypes, we use attack tree analysis within the same formal framework to provide guidance for the refinement of the system specification. Since the interleaved attack analysis steps are formalized in the same expressive formal framework, the found attack paths lead to system states that provide concrete information about the loopholes. This information allows adding details to data types, system functions, as well as policies thereby providing a next level of a more refined specification closer to an implementation. This is a cyclic process: in turn the newly refined specification can be scrutinized by attack tree analysis since the added detail may provide additional new loopholes. However, there is a stepwise improvement. Since we are using the highly expressive Higher Order Logic of Isabelle as a basis for our formal framework, we can implement the framework so that it allows expressing infrastructures including actors and policies, attack trees based on Kripke structures and temporal logic CTL, as well as a closed notion of dependability refinement that encompasses the idea that a refined specification after attack tree analysis excludes previous attacks.

### I-B Contributions

This paper for the first time completes the RR-cycle by presenting (a) a full formal definition of the RR-cycle including (b) a formal notion of refinement with a property preservation theory, and (c) a formal notion of termination of the RR-cycle. All of those definitions and related theorems are validated on the application example of the Corona-cirus Warning App (CWA) completing a preliminary workshop publication [3]. It thus completes and extends previous works on establishing the RR-cycle. A more detailed account of this completion and how it relates to previous works is given in Section VIII-A.

### I-C Overview of the paper

In this paper, we first give a general overview over different approaches to refinement in formal methods in Section III to provide a better understanding and foundation for our formal framework. In Section IV, we then proceed to present the technical core of the paper, the formal definition of refinement including property preservation and mainly the definition of the RR-Cycle itself by providing a termination condition. Section V introduces the case study, the CWA. Then, we illustrate application of the Isabelle Infrastructure framework for modeling and initially analyzing the application in Section VI. The methodology of the RR-Cycle is illustrated on the CWA case study in Section VII. That is, we repeatedly refine the initial specification based on the attack tree results until the termination condition of the RR-Cycle is met and the global policy is thus valid. Section VIII discusses related work and presents conclusions. An overview over the Isabelle Infrastructure framework that is used as the vehicle for the formalisation of the RR-cycle is given next in Section II.

## Ii Isabelle Infrastructure Framework

The Isabelle Infrastructure framework is implemented as an instance of Higher Order Logic in the interactive generic theorem prover Isabelle/HOL [4]. The framework enables formalizing and proving of systems with physical and logical components, actors and policies. It has been designed for the analysis of insider threats. However, the implemented theory of temporal logic combined with Kripke structures and its generic notion of state transitions are a perfect match to be combined with attack trees into a process for formal security engineering [5] including an accompanying framework [6]. An overview of the layers of object-logics of the Isabelle Infrastructure framework is provided in Figure 1: each level above is embedded into the one below; the novel contribution of the Corona-virus warning app that will be presented in this paper are the highlighted two top boxes of the figure.

#### Ii-1 Kripke structures, CTL and Attack Trees

A number of case studies (see Section VIII) have contributed to shape the Isabelle framework into a general framework for the state-based security analysis of infrastructures with policies and actors. Temporal logic and Kripke structures are deeply embedded into Isabelle’s Higher Order logic thereby enabling meta-theoretical proofs about the foundations: for example, equivalence between attack trees and CTL statements have been established [7] providing sound foundations for applications. This foundation provides a generic notion of state transition on which attack trees and temporal logic can be used to express properties for applications. The logical concepts and related notions thus provided for sound application modeling are:

• Kripke structures and state transitions:
A generic state transition relation is ; Kripke structures over a set of states t reachable by from an initial state set I can be constructed by the Kripke contructor as

Kripke {t. $$\exists$$ i $$\in$$ I. i $$\to_{i}^{*}$$ t} I


• CTL statements:
We can use the Computation Tree Logic (CTL) to specify dependability properties as

K $$\vdash$$  EF s

This formula states that in Kripke structure K there is a path (E) on which the property s (given as the set of states in which the property is true) will eventually (F) hold.

• Attack trees:
attack trees are defined as a recursive datatype in Isabelle having three constructors: creates or-trees and creates and-trees. And-attack trees and or-attack trees consist of a list of sub-attacks which are themselves recursively given as attack trees. The third constructor takes as input a pair of state sets constructing a base attack step between two state sets. For example, for the sets I and s this is written as . As a further example, a two step and-attack leading from state set I via si to s is expressed as

$$\vdash$$ [$${\cal{N}}_{\texttt{(I,si)}}$$,$${\cal{N}}_{\texttt{(si,s)}}$$]$$\oplus_{\wedge}^{\texttt{(I,s)}}$$


• Attack tree refinement, validity and adequacy:
Attack trees can be constructed also by a refinement process but this is different from the system refinement presented in this paper (see Section IV). An abstract attack tree may be refined by spelling out the attack steps until a valid attack is reached:

A :: (:: state) attree).

The validity is defined constructively so that code can be generated from it. Adequacy with respect to a formal semantics in CTL is proved and can be used to facilitate actual application verification. This will be used for the stepwise system refinements central to the methodology presented in this paper.

## Iii Formal Refinement Overview

Refinement is intuitively thought of as a systematic way of developing system designs from some abstract specification to a more detailed one. In the process of doing so, details that are left out in the abstraction are filled in. Refinement is a very natural way of step by step engineering systems starting from an abstract system design. However, whereas in traditional engineering disciplines validation is done through simulation, prototyping and testing, software-based systems (including IoT systems) allow formally specifying abstract system designs with logic. Then the concept of refinement can be explicitly formalized providing means to prove properties of the formal system design (also known as system specification) as well as reasoning in general about refinement, most importantly, how the refinement preserves properties or achieves design goals.

We consider systems as IoT system or cyber-physical systems, that is, systems constituted of hardware and software components. We emphasize that these systems should be designed in a human centric way. Although humans cannot be considered as system parts, their interaction can be specified as well by considering policies that specify the possible behaviour of actors as parts of the system design. If behavioural descriptions of human behaviour exist, they may well be integrated into the system specification. An example are insider threats where a taxonomy and empirical knowledge about human behaviour has been specified to provide a more detailed model of psychological dispositions of actors on when they start being insider attackers in order to analyze attacks and countermeasures. System specifications consist of datatypes for actors, physical system aspects (hardware, credentials, locations) and logical aspects, e.g., data, access control labels, software functions (types of functions, other descriptions of functionality like logical (axiomatic) descriptions) and policies. To summarize this heterogeneous set of entities, we use the simple description “infrastructures with (human) actors and policies”.

Refinement is the process that starts with an initial more or less formal description of such a system adding more detail. This process is iterative, best described as a cycle in which a range of design and analysis activities lead to a more definite, concrete specification and thus a more specific, tangible description of the system.

### Iii-a Refinement methods

The concept of formal refinement has a long standing in software system engineering in communicating and reactive system methods, for example, state charts, CSP [8], CCS [9], the Pi-calculus [10], as well as in data-based system specification methods like Z [11, 12], B [13], VDM. The field is well-explored and the literature is consequently vast so it is beyond the scope of this paper to give a comprehensive survey. Nevertheless, we are providing here an informal overview of a range of major techniques commonly used in these formal methods. This overview serves conveying the conceptual ideas to prepare the way for their implementation in the formal refinement of the RR-cycle and illustrate them on the application. The related work, Section VIII, will, however, contrast in more detail to similar formal refinement approaches.

#### Iii-A1 State based trace refinement of Isabelle Infrastructure framework

The Isabelle Infrastructure framework enables a state-based specification (sometimes called model-driven) of systems where the overall system behaviour is defined (by an inductive definition) as a set of traces of system states (that include all aspects of infrastructures, actors, and policies). The system behaviour is the set of all possible traces. The general idea of refinement is to make the behaviour more concrete, that is, to reduce the set of all possible traces. Then a refinement can be simply defined by trace inclusion: in other words, system specification refines to system specification , formally,

 S0⊑S1≡traces(S0)⊇traces(S1)

(the inversion of the inclusion signs is deliberate not accidental). It turns out that this inclusion based refinement is quite versatile compared to existing forms of refinement simply because it combines a rich state based view with the idea of traces of system changes that include events (or actions) while the events themselves are abstract but can be refined by adding detail to the context conditions in the inductive definitions associated to the events. We now consider the various forms of refinement informally to prepare the ground and to give reference points to later stages when we explain in the application of the Isabelle Infrastructure framework which form of refinement we are looking at in a specific instance.

#### Iii-A2 Event based trace refinement

Formally, the definition of trace refinement is given above: more refined behavioural models have a subset of the traces of the abstract model. The intuition behind the trace inclusion based refinement is that a more abstract specification allows various choices how system states may be reached while a more definite system description reduces those choices by selecting the system traces corresponding to a more specific system implementation. This general notion of reduction of system traces may be a design decision motivated by detected failures or simply motivated by making the behaviour more deterministic and consequently needs to go along with various changes to the design. This need for changes leads to the following category of data refinement.

#### Iii-A3 Data refinement

This refinement is defined by making an abstract data type more concrete in the refinement step. A classical example is refining sets into the data type of lists which leads to making choices on order and replication of elements: since lists are ordered, for example, the set could be represented in the refinement by the list [1,2,3] but also [3,2,1] – besides four other orders. In addition, also [1,1,2,3] could be a possible refinement if replication is deemed a targeted feature. As a consequence of this choice for refining the data, certain traces are excluded in the refinement that could have been possible system traces under different choices for order and replication.

### Iii-B Choices, Failures, and Divergences

To provide a closed representation of process behaviour, event based process calculi like CSP [8, 14] or CCS/Pi-calculus introduce choices between different events that are possible in the course of the execution of a process. In the trace model it is simple to see the extension of those choices: there will be a number of different continuations of a trace , for example and represent the potential continuation of a trace of events by either an event or an event if those two events are provided by a choice at the point of execution of the process after trace has been executed. The syntax represents trace concatenation and creates a trace consisting of just the event . In a purely event-based calculus, like CSP, CCS, or Pi, the internal system behaviour needs to be expressed in the specification purely based on the reaction, since state information is not part of the model by contrast to a model like the one used in the Isabelle Infrastructure framework (see previous subsection). So to distinguish between different ways the specified system may react to the events that the environment offers, CSP, for example, offers different kinds of choices that correspond to two levels of non-determinism. For any processes and the process

 P□Q

is the so-called external choice between and . This process is the process that offers the first events of and to the environment111The notion of environment means all other processes. and then behaves either like or like depending on the choice the environment made. Only if the environment offers events that may be communicated by both processes there is a real choice of the process.

The process

 P⊓Q

is the so-called internal choice or non-deterministic choice. In contrast to the former, it expresses that the choice lies with the system. That is, even if the environment offers only events that only one of and can communicate, the process may refuse to communicate one of the offered events. Hence, it represents proper non-determinism as the decision cannot be determined by the environment.

In a refinement, non-deterministic choices can be replaced by deterministic choices: making a process more deterministic moves it closer to an implementation. However, the traces model is not sufficient to show the difference. In this model both processes above have the same set of traces and are thus semantically equivalent:  .

#### Iii-B1 Failures

In terms of modeling these different types of choices, it turns out that the pure trace model is not sufficient because traces record only what processes can do but not what they must do or in other words what they can refuse to do. In the example above, both processes have the trace with and included in their trace set because for both this continuation is a possibility. However, the traces do not reflect that the process using the non-deterministic choice could have stopped and refused to do anything in most cases. Therefore, to integrate this additional view on choices, the CSP calculus offers a second semantics adding refusals. Refusals are sets of events that a process can refuse at some point of execution, that is, for a certain trace. Failures are the semantic model in which traces are combined with sets of refusals. In this semantics our two processes above become distinguishable because can only refuse what both and can refuse whereas only must accept if neither nor can refuse and thus in the failures model and consequently  .

#### Iii-B2 Divergences and Termination

Undefined behaviour may be represented by a system internal invisible action, for example, in CSP. Nontermination can then be defined by an infinite sequence of such actions. For those traces where a program (or system) may diverge, the system does not communicate anything so there is no definite behaviour. In order to arrive at a meaningful refinement order that augments failures and divergences, all system traces are considered possible from the point of divergence onwards. Reducing non-termination leads to concretizing a program corresponding to extending the range for which the program produces meaningful output. The most general behaviour is represented by the process that diverges from the beginning. The divergences intuitively corresponds more closely to a view of input-output program where classically termination is part of total correctness but is less suitable for the wider system view we have here.

#### Iii-B3 Action refinement

In reactive systems languages as well as human centric system formalisms, events or actions are a central modeling entity. The terms “action” or “event” are mostly used synonymously but in some formalism, like statecharts, actions and events are different entities – one representing pre-requisites of transitions, the other their consequences. Actions can be made more definite by combining them or making their occurrence dependent on conditions (or guards) of the context.

### Iii-C Comparison of Isabelle’s Infrastructure, State-based and Event-based Refinements

The formal notion of refinement for Isabelle infrastructures with actors naturally enables all the different notions of refinement because this refinement is simply based on an Isabelle type map and an inductive definition of behaviour. Isabelle’s datatypes are rich enough to support most data refinements. Termination-based and non-deterministic refinement notions are typical for purely reactive systems where the entire state is abstract and system behaviour is purely described by traces of events. In our case, we might have the possibility of nontermination, for example by a loop in a state that may occur infinitely often for all actions so that the state does not change. Similarly, failures means that in a state some actions may not occur which can be modeled in the inductive definition of the system behaviour.

A language like CSP is an explicit calculus for constructing specific specified communication patterns using “standardized” constructors. However, using Higher Order Logic in Isabelle, we can simulate all choices, failures, etc, explicitly in the logic. Even though we do not have standardized constructors, we can mimic each and any of the CSP constructors. Moreover, we are not limited by a pure event-based view and can additionally simulate those constructors that CSP cannot offer, like action refinement. The latter is possible because our actions are more abstract and a vast portion of the action semantics is implemented in the additional preconditions of the corresponding rule of the semantics for this action (see Section VI-A) which provides fine grained tuning possibilities.

## Iv Formal Definition of Refinement

### Iv-a Dependability Engineering

As described initially, Dependability Engineering consists of two complimentary activities: protection goal specification and attack (or, more generally, risk) analysis. These activities can be defined informally by “natural language” descriptions or using semi-formal modeling languages. Attack trees and Misuse Cases are commonly employed and recommended for this. For the design, UML diagrams can be used. However, we take this general, informal or semi-formal methodology further in our Isabelle Infrastructure framework that embeds attack trees in a fully formal way equipping it with a temporal logic CTL semantics that supports finding attacks based on negating global policies. Moreover, we adopt the system specification refinement idea from classical software engineering whereby we overcome a crucial shortcoming of refinement that has – in our opinion – impeded the success and the adoption of formal notions of refinement: how to find the right refinement relation for a refinement step in a concrete application!

Finding the right refinement relation is in the security engineering process evidently provided by the attacks found through the attack tree analysis. Since the attack properties as well as attack paths are an outcome of the adequacy translation of the attack trees, these properties (in which the global security property is violated) can be used. We want to avoid them. Since refinement is provided by trace sets, we do this simply by excluding the attack traces and thereby we direct the refinement. Whether it is a data refinement, action refinement of another form of refinement (see Section III) may vary but all forms are possible. We will illustrate the most relevant ones by our case study, after introducing the formal notion of Dependability Refinement in the remainder of this section.

### Iv-B Formal Definition of Refinement

Intuitively, a refinement changes some aspect of the type of the state, for example, replaces a data type by a richer datatype or restricts the behaviour of the actors. The former is expressed directly by a mapping of datatypes, the latter is incorporated into the state transition relation of the Kripke structure that corresponds to the transformed model. In other words, we can encode a refinement within our framework as a relation on Kripke structures that is parameterized additionally by a function that maps the refined type to the abstract type. The direction “from refined to abstract” of this type mapping may seem curiously counter-intuitive. However, the actual refinement is given by the refinement that uses this function as an input. The refinement then refines an abstract to a more concrete system specification. The additional layer for the refinement can be formalized in Isabelle as a refinement relation . The relation refinement is typed as a relation over triples – a function from a threefold Cartesian product to bool, the type containing true and false only. The type variables and input to the type constructor Kripke represent the abstract state type and the concrete state type. Consequently, the middle element of the triples selected by the relation refinement is a function of type mapping elements of the refined state to the abstract state. The expression in quotation marks after the type is again the infix syntax in Isabelle that allows the definition of mathematical notation instead of writing refinement in prefix manner. This nicer infix syntax is already used in the actual definition. Finally, the arrow is the implication of Isabelle’s meta-logic while is the one of the object logic HOL. They are logically equivalent but of different types: within a HOL formula , for example, as below , only the implication can be used.

 refinement :: ($$\sigma$$ kripke $$\times$$ ($$\sigma$$’ $$\Rightarrow$$ $$\sigma$$) $$\times$$ $$\sigma$$’ kripke)
$$\Rightarrow$$ bool ("_ $$\sqsubseteq_{(\_)}$$ _")
K $$\sqsubseteq_{\mathcal{E}}$$ K’ $$\equiv$$ $$\forall$$ s’ $$\in$$ states K’. $$\forall$$ s $$\in$$ init K’.
s $$\to_{\sigma^{\prime}}^{*}$$ s’ $$\longrightarrow$$ $$\mathcal{E}$$(s) $$\in$$ init K
$$\land$$ $$\mathcal{E}$$(s) $$\to_{\sigma}^{*}$$ $$\mathcal{E}$$(s’)

The definition of K   K’ states that for any state of the refined Kripke structure that can be reached by the state transition in zero or more steps from an initial state of the refined Kripke structure, the mapping from the refined to the abstract model’s state must preserve this reachability, i.e., the image of must also be an initial state and from there the image of under must be reached with or steps.

### Iv-C Property Preserving System Refinement

A first direct consequence of this definition is the following lemma where the operator in (init K’) represents function image, that is the set, .

lemma init_ref: K $$\sqsubseteq_{\mathcal{E}}$$ K’ $$\Longrightarrow$$ $$\mathcal{E}$$‘(init K’) $$\subseteq$$ init K

A more prominent consequence of the definition of refinement is that of property preservation. Here, we show that refinement preserves the CTL property of which means that a reachability property true in the refined model K’ is already true in the abstract model. A state set represents a property in the predicate transformer view of properties as sets of states. The additional condition on initial states ensures that we cannot “forget” them.
theorem prop_pres:
K $$\sqsubseteq_{\mathcal{E}}$$ K’  $$\Longrightarrow$$ init K $$\subseteq$$ $$\mathcal{E}$$‘(init K’) $$\Longrightarrow$$
$$\forall$$ s’ $$\in$$ Pow(states K’).
K’ $$\vdash$$  EF s’ $$\longrightarrow$$ K $$\vdash$$  EF ($$\mathcal{E}$$‘(s’))

It is remarkable, that our definition of refinement by Kripke structure refinement entails property preservation and makes it possible to prove this as a theorem in Isabelle once for all, i.e., as a meta-theorem. However, this is due to the fact that our generic definition of state transition allows to explicitly formalize such sophisticated concepts like reachability. For practical purposes, however, the proof obligation of showing that a specific refinement is in fact a refinement is rather complex justly because of the explicit use of the transitive closure of the state transition relation. In most cases, the refinement will be simpler. Therefore, we offer additional help by the following theorem that uses a stronger characterization of Kripke structure refinement and shows that our refinement follows from this.
theorem strong_mt:
$$\mathcal{E}$$‘(init K’) $$\subseteq$$ init K $$\land$$ s $$\to_{\sigma^{\prime}}$$ s’ $$\longrightarrow$$ $$\mathcal{E}$$(s) $$\to_{\sigma}$$ $$\mathcal{E}$$(s’)
$$\Longrightarrow$$ K $$\sqsubseteq_{\mathcal{E}}$$ K’

This simpler characterization is in fact a stronger one: we could have in the refined Kripke structure K’ and but neither nor are reachable from initial states in K’. For cases, where we want to have the simpler one-step proviso but still need reachability we provide a slightly weaker version of strong_mt.
theorem strong_mt’:
$$\mathcal{E}$$‘(init K’) $$\subseteq$$ init K $$\land$$ ($$\exists$$ s0 $$\in$$ init K’. s0  $$\to_{i}^{*}$$ s)
$$\land$$ s $$\to_{\sigma^{\prime}}$$ s’ $$\longrightarrow$$ $$\mathcal{E}$$(s) $$\to_{\sigma}$$ $$\mathcal{E}$$(s’) $$\Longrightarrow$$ K $$\sqsubseteq_{\mathcal{E}}$$ K’


The idea of property preservation coincides with the classical idea of trace refinement as defined for process algebra semantics like CSP’s. In these semantics, the behaviour of a system is defined as the set of its traces, which are all possible sequences of events reflecting each moment of system execution. Consequently, a system S’ is a refinement of another system S if the semantics of S’ is a subset of the traces of the former one. We adopt this idea in principle, but extend it substantially: our notion additionally incorporates structural refinement. Since we include a state map

in our refinement map, we additionally allow structural refinement: the state map generalizes the basic idea of trace refinement by traces corresponding to each other but allows additionally an exchange of data types thus allowing exchange of the structure of the system. As we see in the application to the case study in Section VII-A, the refinement steps may sometimes just specialize the traces: in this case the state map is just identity.

In addition, we also have a simple implicit version of action refinement. In an action refinement, traces may be refined by combining consecutive system events into atomic events thereby reducing traces. We can observe this kind of refinement in the second refinement step of CWA considered in Section VII-B.

### Iv-D Formal Definition of RR-Cycle

Given the closed formal definition of refinement provided in the previous section, we can now formalize the RR-Cycle by the following predicate.

  RR_cycle K K’ s :: $$\sigma$$ kripke $$\Rightarrow$$ $$\sigma$$’ kripke
$$\Rightarrow$$ $$\sigma$$’ set $$\Rightarrow$$ bool
RR_cycle K K’ s $$\equiv$$
$$\exists$$ $$\mathcal{E}$$ :: $$\sigma$$’ $$\Rightarrow$$ $$\sigma$$. K $$\vdash$$  EF($$\mathcal{E}$$ ‘s) $$\land$$ K $$\sqsubseteq_{\mathcal{E}}$$ K’
$$\longrightarrow$$ $$\neg$$(K’ $$\vdash$$  EF s)

This predicate encompasses that the Kripke K’ is a dependable system refinement of the Kripke structure K if there exists a data type map such that the refined Kripke structure K’ avoids the attack.

The RR-cycle is an iterative process. Since refinement is a transitive relation, the individual refinement steps found in each iteration line up to one final refinement step. The formal predicate RR_cycle constitutes the termination condition for the cycle for the specified security goal s and for the lined up individual refinement steps. Only when a sufficient number of interleaved refinement and attack tree iterations have eradicated all attacks on the security goal s will the system specification be secure and this is verified by RR_cycle.

## V Application Example Corona-Virus Warning App

The German Chancellor Angela Merkel has strongly supported the publication of the mobile phone Corona-virus warning app (CWA; [15]) by publicly proclaiming that “this App deserves your trust” [16]. Many millions of mobile phone users in Germany have downloaded the app with 6 million on the first day. CWA is one amongst many similar applications that aim at the very important goal to “break infection chains” by providing timely information to users of whether they have been in close proximity to a person who tested positive for COVID-19.

The app was designed with great attention on privacy: a distributed architecture [17] has been adopted that is based on a very clever application design whereby clients broadcast highly anonymized identifiers (ids) via Bluetooth and store those ids of people in close proximity. Infected persons report their infection by uploading their ids to a central server, providing all clients the means to assess exposure risk locally, hence, stored contact data has never to be shared.

### V-a DP-3T and PEPP-PT

We are mainly concerned with the architecture and protocols proposed by the DP-3T (Decentralized Privacy-Preserving Proximity Tracing) project [18]. The main reason to focus on this particular family of protocols is the Exposure Notification Framework (ENF), jointly published by Apple and Google [19], that adopts core principles of the DP-3T proposal. This API is not only used in CWA but has the potential of being widely adopted in future app developments that might emerge due to the reach of players like Apple and Google.

There are, however, competing architectures noteworthy, namely protocols developed under the roof of the Pan-European Privacy-Preserving Proximity Tracing project (PEPP-PT) [20], e.g. PEPP-PT-ROBERT [21], that might be characterized as centralized architectures.

Neither DP-3T nor PEPP-PT are synonymous for just one single protocol. Each project endorses different protocols with unique properties in terms of privacy and data protection.

There is a variety of noteworthy privacy and security issues. The debate among advocates of centralized architectures and those in favor of a decentralized approach in particular yields a lot of interesting material detailing different attacks and possible mitigation strategies: [22], [23], [24].

In terms of attack scenarios, we focus on, what might be classified as deanonymisation attacks: Tracking a device (see

[24][p9], [22][p8]) and identifying infected individuals (see [24][p5][22][p9]).

#### V-A1 Basic DP-3T architecture

Upon installation, the app generates secret daily seeds to derive so-called Ephemeral Ids (EphIDs) from them. EphIDs are generated locally with cryptographic methods and cannot be connected to one another but only reconstructed from the secret seed they were derived from.

During normal operation each client broadcasts its EphIDs via Bluetooth whilst scanning for EphIDs broadcasted by other devices in the vicinity. Collected EphIDs are stored locally along with associated meta-data such as signal attenuation and date. In DP-3T the contact information gathered is never shared but only evaluated locally.

If patients test positive for the Corona-virus, they are entitled to upload specific data to a central backend server. This data is aggregated by the backend server and redistributed to all clients regularly to provide the means for local risk scoring, i. e., determining whether collected EphIDs match those broadcasted by now-confirmed Corona-virus patients during the last, e. g., 14, days.

In the most simple (and insecure) protocol proposed by DP-3T this basically translates into publishing the daily seeds used to derive EphIDs from. The protocol implemented by ENF and, hence, CWA adopts this low-cost design [17]. DP-3T proposes two other, more sophisticated protocols that improve privacy and data protection properties to different degrees but are more costly to implement. Figure 2 illustrates the basic system architecture along with some of the mitigation measures either implemented in CWA or proposed by DP-3T.

### V-B Instantiation of Framework

The formal model of CWA uses the Isabelle Infrastructure framework instantiating it by reusing its concept of actors for users and smartphones whereby locations correspond to physical locations. The Ephemeral Ids, their sending and change is added to Infrastructures by slightly adapting the basic state type of infrastructure graphs and accordingly the semantic rules for the actions move, get, and put. The details of the newly adapted Infrastructure are presented in Section VI. Technically, an Isabelle theory file Infrastructure.thy builds on top of the theories for Kripke structures and CTL (MC.thy), attack trees (AT.thy), and security refinement (Refinement.thy). Thus all these concepts can be used to specify the formal model for CWA, express relevant and interesting properties, and conduct interactive proofs (with the full support of the powerful and highly automated proof support of Isabelle). All Isabelle sources are available online [25].

## Vi Modeling and analyzing CWA

### Vi-a Infrastructures, Policies, and Actors

The Isabelle Infrastructure framework supports the representation of infrastructures as graphs with actors and policies attached to nodes. These infrastructures are the states of the Kripke structure.

The transition between states is triggered by non-parameterized actions get, move, and put executed by actors. Actors are given by an abstract type actor and a function Actor that creates elements of that type from identities (of type string written ’’s’’ in Isabelle). Actors are contained in an infrastructure graph type igraph constructed by its constructor Lgraph.

 datatype igraph =
Lgraph (location $$\times$$ location)set
location $$\Rightarrow$$ identity set
identity $$\Rightarrow$$ (string set $$\times$$ string set $$\times$$ efid)
location $$\Rightarrow$$ string $$\times$$ (dlm $$\times$$ data) set
location $$\Rightarrow$$ efid set
identity $$\Rightarrow$$ location $$\Rightarrow$$ (identity $$\times$$ efid) set

In the current application of the framework to the CWA case study, this graph contains a set of location pairs representing the topology of the infrastructure as a graph of nodes and a function that assigns a set of actor identities to each node (location) in the graph. The third component of an igraph assigns the credentials to each actor: a triple-valued function whose first range component is a set describing the credentials in the possession of an actor and the second component is a set defining the roles the actor can take on; most prominently the third component is the efid assigned to the actor. This is initially just a natural number but will be refined to actually represent lists of Ephemeral Ids later when refining the specification.
 datatype efid = Efid nat

The fourth component of the type igraph assigns security labeled data to locations, a feature not used in the current application. The second to last component assigns the set of efids of all currently present smart phones to each location of the graph. The last component finally denotes the knowledge set of each actor for each location: a set of pairs of actors and potential ids.

Corresponding projection functions for each of the components of an infrastructure graph are provided; they are named gra for the actual set of pairs of locations, agra for the actor map, cgra for the credentials, lgra for the data at that location, egra for the assignment of current efids to locations, and kgra for the knowledge set for each actor for each location.

To start the CWA case study, we provide an example scenario: the initial values for the igraph components use two locations pub and shop to define the following constants (we omit the data map component ex_locs). The map ex_loc_ass is a function, defined using the -calculus of Isabelle, mapping each location to the set of actor identities currently at this location.

 ex_loc_ass $$\equiv$$
($$\lambda$$ x. if x = pub then {’’Alice’’, ’’Bob’’, ’’Eve’’}
else (if x = shop then {’’Charly’’, ’’David’’}
else {}))

The component ex_creds is a function assigning actor identities to triplets of which the most important part is the third element that contains the actors’ ephemeral ids.
 ex_creds $$\equiv$$
($$\lambda$$ x. if x = ’’Alice’’ then Efid 1 else
(if x = ’’Bob’’ then Efid 2 else
(if x = ’’Charly’’ then Efid 3 else
(if x = ’’David’’ then Efid 4 else
(if x = ’’Eve’’ then Efid 5
else Efid 0)))))

The ephemeral ids of all actors present at a location are collected in the component ex_efids that maps each location to the set of these efids.
 ex_efids $$\equiv$$
($$\lambda$$ x. if x = pub then {Efid 1, Efid 2, Efid 5}
else (if x = shop then {Efid 3, Efid 4} else {}))

The final component ex_knos assigns actors to functions mapping each location to sets of pairs of actor identities and efids: these sets of pairs represent what can be inferred from observing the present actors and all efids at a location; simply all possible combinations of those. Initially, however the empty set of pairs is assigned to all locations because nothing is known yet.
 ex_knos $$\equiv$$
($$\lambda$$ x. (if x = Actor ’’Eve’’ then ($$\lambda$$ l. {})
else ($$\lambda$$ l. {})

These components are wrapped up into the following igraph.
 ex_graph $$\equiv$$
Lgraph {(pub, shop)}
ex_loc_ass ex_creds ex_locs ex_efids ex_knos

Infrastructures are given by the following datatype that contains an infrastructure graph of type igraph and a policy given by a function that assigns local policies over a graph to all locations of the graph.
 datatype infrastructure =
Infrastructure igraph
[igraph, location] $$\Rightarrow$$ policy set

There are projection functions graphI and delta when applied to an infrastructure return the graph and the local policies, respectively. The function local_policies gives the policy for each location x over an infrastructure graph G as a pair: the first element of this pair is a function specifying the actors y that are entitled to perform the actions specified in the set which is the second element of that pair. The local policies definition for CWA, simply permits all actions to all actors in both locations.
 local_policies G $$\equiv$$
($$\lambda$$ x. if x = pub then
{($$\lambda$$ y. True, {get,move,put})}
else (if x = shop
then {($$\lambda$$ y. True, {get,move,put})}
else {}))

For CWA, the initial infrastructure contains the graph ex_graph with its two locations pub and shop and is then wrapped up with the local policies into the infrastructure Corona_scenario that represents the “initial” state for the Kripke structure.
 Corona_scenario $$\equiv$$
Infrastructure ex_graph local_policies


### Vi-B Policies, privacy, and behaviour

Policies specify the expected behaviour of actors of an infrastructure. They are given by pairs of predicates (conditions) and sets of (enabled) actions. They are defined by the enables predicate: an actor h is enabled to perform an action a in infrastructure I, at location l if there exists a pair (p,e) in the local policy of l (delta I l projects to the local policy) such that the action a is a member of the action set e and the policy predicate p holds for actor h.

enables I l h a $$\equiv$$ $$\exists$$ (p,e) $$\in$$ delta I l. a $$\in$$ e $$\land$$ p h


The privacy protection goal is to avoid deanonymization. That is, an attacker should not be able to disambiguate the set of pairs of real ids and EphIDs. This is abstractly expressed in the predicate identifiable. The clue is to filter the set A of all pairs of real ids and ephemeral ids eid for the given eid. If this set happens to have only one element, that is, there is only one pair featuring this specific eid, then the real id of that efid must be the first component, the identity Id.

 identifiable eid A $$\equiv$$
is_singleton{(Id,Eid). (Id, Eid) $$\in$$ A $$\land$$ Eid = eid}

The predicate identifiable is used to express the global policy ‘Eve cannot deanonymize an Ephemeral Id eid using the gathered knowledge’.
 global_policy I eid $$\equiv$$
$$\forall$$ L $$\subseteq$$ nodes(graphI I).
$$\neg$$(identifiable eid
($$\bigcap$$ (kgra(graphI I) (Actor ’’Eve’’)‘ L)
- {(x,y). x = ’’Eve’’}))

Gathering the “observations” of the actor Eve over all locations in the above policy formula is achieved by combining all pairs of ids and efids that Eve could infer on her movements through different locations, then building the intersection over those sets to condense the information and finally substracting all pairs in which Eve’s own identity appears as first element. The resulting set of pairs should not allow identification of any eid, that is, it must not be singleton for the global policy to hold, and thus, to protect privacy of users.

### Vi-C Infrastructure state transition

The state transition relation is defined as a syntactic infix notation I   I’ and denotes that infrastructures I and I’ are in this relation. To give an impression of this definition, we show first the rule defining the state transition for the action get. Initially, this rule expresses that an actor that resides at a location l in graph G (denoted as “a  l”) and is enabled by the local policy in this location to “get”, can combine all ids at the current location (contained in egra G l) with all actors at the current location (contained in agra G l) and add this set of pairs to his knowledge set kgra G using the function update f(l := n) redefining the function f for the input l to have now the new value n.

  get: G = graphI I $$\Longrightarrow$$ a $$@_{G}$$ l $$\Longrightarrow$$
enables I l (Actor a) get $$\Longrightarrow$$
I’ = Infrastructure
(Lgraph (gra G)(agra G)(cgra G)(lgra G)(egra G)
((kgra G)((Actor a) := ((kgra G (Actor a))(l:=
{(x,y). x $$\in$$ agra G l $$\land$$ y $$\in$$ egra G l})))))
(delta I)
$$\Longrightarrow$$ I $$\to_{i}$$ I’

Another interesting rule for the state transition is the one for move whose structure resembles the previous one.
 move: G = graphI I $$\Longrightarrow$$ a $$@_{G}$$ l $$\Longrightarrow$$
a $$\in$$ actors_graph(graphI I) $$\Longrightarrow$$ l $$\in$$ nodes G $$\Longrightarrow$$
l’ $$\in$$ nodes G $$\Longrightarrow$$ enables I l’ (Actor h) move $$\Longrightarrow$$
I’ = Infrastructure
(move_graph_a a l l’ (graphI I))(delta I)
$$\Longrightarrow$$ I $$\to_{i}$$ I’

The semantics of this rule is implemented in the function move_graph_a that adapts the infrastructure state so that the moving actor a is now associated to the target location l’ in the actor map agra and not any more at the previous location l and also the association of efids is updated accordingly.
move_graph_a n l l’ g $$\equiv$$
Lgraph (gra g)
(if n $$\in$$ ((agra g) l) $$\land$$  n $$\notin$$ ((agra g) l’) then
((agra g)(l := (agra g l) - {n}))
(l’ := (insert n (agra g l’))) else (agra g))
(cgra g)(lgra g)
(if n $$\in$$ ((agra g) l) $$\land$$  n $$\notin$$ ((agra g) l’) then
((egra g)(l := (egra g l) - {cgra g n}))
(l’ := (insert cgra g n)(egra g l’)))
else egra g)
(kgra g)

Based on this state transition and the above defined Corona_scenario, we define the first Kripke structure.
 corona_Kripke $$\equiv$$ Kripke {I. Corona_scenario $$\to_{i}^{*}$$ I}
{Corona_scenario}


### Vi-D Attack analysis

For the analysis of attacks, we negate the security property that we want to achieve, usually expressed as the global policy.

Since we consider a predicate transformer semantics, we use sets of states to represent properties. The invalidated global policy is given by the set scorona.

 scorona $$\equiv$$ {x. $$\exists$$ n. $$\neg$$ global_policy x (Efid n)}

The attack we are interested in is to see whether for the scenario
 Kripke_scenario $$\equiv$$  Infrastructure ex_graph
local_policies

from the initial state Icorona {corona_scenario}, the critical state scorona can be reached, that is, is there a valid attack (Icorona,scorona)?

As an initial step, we derive a valid and-attack for the Kripke structure corona_Kripke using the attack tree proof calculus.

  $$\vdash$$ [$${\cal{N}}_{\texttt{(Icorona,Corona)}}$$,$${\cal{N}}_{\texttt{(Corona,Corona1)}}$$,$${\cal{N}}_{\texttt{(Corona1,Corona2)}}$$,
$${\cal{N}}_{\texttt{(Corona2,Corona3)}}$$,$${\cal{N}}_{\texttt{(Corona3,scorona)}}$$]$$\oplus_{\wedge}^{\texttt{(Icorona,scorona)}}$$

The properties (and thus sets) Corona, Corona1, Corona2, Corona3 are intermediate states given by Bob moving to shop and Eve following him while collecting the Ephemeral Ids in each location. This collected information enables Eve to identify Bob’s Ephemeral Id.

From these preparations, we can exhibit that an attack is possible using the attack tree calculus [7].

 corona_Kripke $$\vdash$$  EF scorona

To this end, the Correctness theorem AT_EF is simply applied to immediately prove the above CTL statement. This application of the meta-theorem of Correctness of attack trees saves us proving the CTL formula tediously by exploring the state space in Isabelle proofs. Alternatively, we could use the generated code for the function is_attack_tree in Scala (see [7]) to check that a refined attack of the above is valid.

## Vii Application of Refinement to CWA

### Vii-a Refining the Specification

Clearly, fixed Ephemeral Ids are not really ephemeral. The model presented in Section VI has deliberately been designed abstractly to allow focusing on the basic system architecture and finding an initial deanonymization attack. We now introduce “proper” Ephemeral Ids and show how the system datatype can be refined to a system that uses those instead of the fixed ones.

For the DP-3T Ephemeral Ids [26], for each day a seed is used to generate a list of length , where is the duration for which the Ephemeral Ids are posted by the smart phone

 EphID1 || ... || EphIDn =

“where PRF is a pseudo-random function (e.g., HMAC-SHA256), “broadcast key” is a fixed, public string, and PRG is a pseudorandom generator (e.g. AES in counter mode)” [26].

From a cryptographic point of view, the crucial properties of the Ephemeral Ids are that they are purely random, therefore, they cannot be guessed, but at the same time if – after the actual encounter between sender and receiver – the seed is published, it is feasible to relate any of the EphIDi to for all i . For a formalization of this crucial cryptographic property in Isabelle it suffices to define a new type of list of Ephemeral Ids efidlist containing the root (the first efid), a current efid indicated by a list pointer of type nat, and the actual list of efids given as a function from natural numbers to efids.

datatype efidlist = Efids "efid" "nat" "nat $$\Rightarrow$$ efid"

We define functions for this datatype: efidsroot returning the first of the three constituents in an efidlist (the root ); efids_index giving the second component, the index of the current efid; efids_inc_ind applied to an efidlist increments the index; efids_cur returning the current efid from the list and efids_list for the entire list (the third component).

The first step of refinement replaces the simple efid in the type igraph of the infrastructure type by the new type efidlist. Note, that in the new datatype igraph this change affects only the third component, the credentials, to become

   identity $$\Rightarrow$$ efidlist

The refined state transition relation implements the possibility of changing the Ephemeral Ids by the rule for the action put that resembles very much the rule for get. The important change to the infrastructure state is implemented in the function put_graph_efid that increases the current index in the efidlist in the credential component cgra g n for the “putting” actor identity n and inserts the current efid from that credential component into the egra component, the set of currently “present” Ephemeral Ids at the location l.
 put_graph_efid n l g  $$\equiv$$
Lgraph (gra g)(agra g)
((cgra g)(n :=  efids_inc_ind(cgra g n)))
(lgra g)
((egra g)
(l := insert (efids_cur(efids_inc_ind(cgra g n)))
((egra g l) - {efids_cur(cgra g n)})))
(kgra g)

We can now apply the refinement by defining a datatype map from the refined infrastructure type InfrastructureOne.infrastructure to the former one Infrastructure.infrastructure.
definition refmap :: InfrastructureOne.infrastructure
$$\Rightarrow$$ Infrastructure.infrastructure
where refmap I =
Infrastructure.Infrastructure
(Infrastructure.Lgraph
(InfrastructureOne.gra (graphI I))
(InfrastructureOne.agra (graphI I))
($$\lambda$$ h. repl_efr
(InfrastructureOne.cgra (graphI I)) h)
(InfrastructureOne.lgra (graphI I))
($$\lambda$$ l. $$\lambda$$ a.
efids_root (InfrastructureOne.cgra (graphI I) a)
‘(InfrastructureOne.agra (graphI I) l)
($$\lambda$$ a. $$\lambda$$ l.
if (a $$\in$$ actors_graph(graphI I) $$\land$$
l $$\in$$ nodes (graphI I))
then ($$\lambda$$ (x,y).
(x, efids_root(
InfrastructureOne.cgra (graphI I)
(anonymous_actor I y)))
‘(InfrastructureOne.kgra (graphI I)) a l)
else )))

The function anonymous_actor maps an efid back to the actor it is assigned to if that exists which can be expressed in Isabelle using the Hilbert-operator SOME.
  anonymous_actor I e = (SOME a :: identity.
a $$\in$$ actors_graph (InfrastructureOne.graphI I) $$\land$$
e $$\in$$ range(efids_list (cgra (graphI I) a)))

This refinemnent map is then plugged into the parameter of the refinement operator allowing to prove
 corona_Kripke $$\sqsubseteq$$refmap corona_KripkeO

where the latter is the refined Kripke structure. The proof of this refinement that adds quite some structure requires a great deal of proving of invariants over the datatype.

Surprisingly, we can still prove

 corona_KripkeO $$\vdash$$ EF scoronaO

by using the same attack tree as in the abstract model: if Bob moves from pub to shop, he is vulnerable to being identifiable as long as he does not change the current efid. So, if Eve moves to the shop as well and performs a get before Bob does a put, then Eve’s knowledge set permits identifying Bob’s current Ephemeral Id as his. Despite removing some attack paths, some remain and thus the identification attack persists.

### Vii-B Second Refinement Step

The persistent attack can be abbreviated informally by the action sequence [get,move,move,get] performed by actors Eve, Bob, Eve, and Eve again, respectively. How can a second refinement step avoid that Eve can get identification information for Bob? Can we impose that after the first move of Bob a put action, increasing Bob’s efid and thus obfuscating his id, must happen before Eve can do another get? A very simple remedy to impose this and thus exclude the persistent attack is to enforce a put action after every move by an action refinement that binds these actions together. That is, when actors move they simultaneously change their efid. We can implement that change by a minimal update to the function move_graph_a (see Section VI) by adding an increment (highlighted as efids_inc_ind in the code snippet) of the currently used Ephemeral Id before updating the egra component of the target location. In other words, if Bob moves his egra component is changed automatically, that is, he gets a new efid, to stay with the example.

move_graph_a n l l’ g $$\equiv$$
Lgraph (gra g)
(if n $$\in$$ ((agra g) l) $$\land$$  n $$\notin$$ ((agra g) l’) then
((agra g)(l := (agra g l) - {n}))
(l’ := (insert n (agra g l’))) else (agra g))
(cgra g)(lgra g)
(if n $$\in$$ ((agra g) l) $$\land$$  n $$\notin$$ ((agra g) l’) then
((egra g)(l := (egra g l) - {cgra g n}))
(l’ := insert (efids_cur( efids_inc_ind
(cgra g n)))(egra g l’)))
else egra g)
(kgra g)

This is an action refinement because the move action is changed. It is a refinement, since any trace of the refined model can still be mapped to a trace in the more abstract model just omitting a few steps (the refinement relation is defined using the reflexive transitive closure ): The refinement map refmapI is trivial since the datatypes do not change. Because of intermediate dangerous states in the abstract may be swallowed in the refined model. For example, the state , where Bob has moved but not yet put the new efid, does not exist in the refined model any more.

### Vii-C Finalizing RR-Cycle Analysis

As we have seen in Section IV-D and as has been illustrated by the application example, the RR-cycle is an iterative process. The formal predicate constitutes the termination condition for the cycle for the specified security goal: in our case study this has been “non-identifiability”. Applying the formal characterization of the predicate RR-cycle to the case study after the two steps detailed in this section, we see that the second refinement step still does not avoid the deanonymization attack. Trying to verify the predicate RR_cycle fails. We find another attack possibility: actor Alice is on her own with the attacker in a location. As a remedy, the final refinement steps ensure that action move is constrained by (a) no actor can move to a location where there are less than three actors and (b) also no actor leaves a location when there are not at least four actors. Step (a) is achieved by refining the specification such that it is only permitted to move if there are three or more actors at the destination location l’. Isabelle’s mathematical library provides the necessary arithmetic and set theory to express this constraint as card (agra g l’)  3. The constraint can be added as an additional precondition to the function move_graph_a for the field agra (and similarly for the egra field, see previous Section VII-B). Similarly, Step (b) can be achieved by refining the specification further to allow an actor to move if the source destination l has at least three actors by adding card (agra g l’)  4 to the agra field and similarly for egra. Summarizing, these two steps lead to a third refinement step that does not change the datatypes (as in the previous refinement step) but restricts the set of possible traces of state transition relations as is manifested in the refined definition of move_graph_a. The additional conditions are highlighted.

move_graph_a n l l’ g $$\equiv$$
Lgraph (gra g)
(if n $$\in$$ ((agra g) l) $$\land$$  n $$\notin$$ ((agra g) l’)
$$\land$$ card (agra g l’) $$\geq$$ 3 $$\land$$ card (agra g l) $$\geq$$ 4
then ((agra g)(l := (agra g l) - {n}))
(l’ := (insert n (agra g l’))) else (agra g))
(cgra g)(lgra g)
(if n $$\in$$ ((agra g) l) $$\land$$  n $$\notin$$ ((agra g) l’)
$$\land$$ card (agra g l’) $$\geq$$ 3 $$\land$$ card (agra g l) $$\geq$$ 4
then ((egra g)(l := (egra g l) - {cgra g n}))
(l’ := insert (efids_cur(efids_inc_ind
(cgra g n)))(egra g l’)))
else egra g)
(kgra g)

After this refinement, the RR_cycle verification succeeds: we can finally prove the following theorem.
  corona_KripkeR $$\vdash$$
$$\neg$$  EF {x. $$\exists$$ n. $$\neg$$ global_policy x (Efid n))}

The proved meta-theory for attack trees can be applied to make this theorem more intuitive: the contraposition of the Correctness property grants that if there is no attack on (I,f), then EF f does not hold in the Kripke structure. This yields the theorem in the following more understandable form since the AG f statement corresponds to EF f.
 corona_KripkeR $$\vdash$$
AG {x. $$\forall$$ n. global_policy x (Efid n)}


## Viii Related Work

### Viii-a Isabelle Insider and Infrastructure framework

A whole range of publications have documented the development of the Isabelle Insider framework. The publications [27, 28, 29] first define the fundamental notions of insiderness, policies, and behaviour showing how these concepts are able to express the classical insider threat patterns identified in the seminal CERT guide on insider threats [30]. This Isabelle Insider framework has been applied to auction protocols [31, 32] illustrating that the Insider framework can embed the inductive approach to protocol verification [33]. An Airplane case study [34, 35] revealed the need for dynamic state verification leading to the extension of adding a mutable state. Meanwhile, the embedding of Kripke structures and CTL into Isabelle have enabled the emulation of Modelchecking and to provide a semantics for attack trees [36, 37, 38, 7, 6]. Attack trees have provided the leverage to integrate Isabelle formal reasoning for IoT systems as has been illustrated in the CHIST-ERA project SUCCESS [5] where attack trees have been used in combination with the Behaviour Interaction Priority (BIP) component architecture model to develop security and privacy enhanced IoT solutions. This development has emphasized the technical rather than the psychological side of the framework development and thus branched off the development of the Isabelle Insider framework into the Isabelle Infrastructure framework. Since the strong expressiveness of Isabelle allows to formalize the IoT scenarios as well as actors and policies, the latter framework can also be applied to evaluate IoT scenarios with respect to policies like the European data privacy regulation GDPR [39]. Application to security protocols first pioneered in the auction protocol application [31, 32]

has further motivated the analysis of Quantum Cryptography which in turn necessitated the extension by probabilities

[40, 41, 42].

Requirements raised by these various security and privacy case studies have shown the need for a cyclic engineering process for developing specifications and refining them towards implementations. A first case study takes the IoT healthcare application and exemplifies a step-by-step refinement interspersed with attack analysis using attack trees to increase privacy by ultimately introducing a blockchain for access control [6]. First ideas to support a dedicated security refinement process are available in a preliminary arxive paper [43] but the current publication is the first to fully formalize the RR-cycle and illustrate its application completely on the Corona-virus Warn App (CWA). The earlier workshop publication [3] provided the formalisation of the CWA illustrating the first two steps but it did not introduce the fully formalised RR-cycle nor did it apply it to arrive at a solution satisfying the global privacy policy.

### Viii-B Refinement in Formal Methods

As has been illustrated on a large scale view in Section III, the concept of refinement features in various other formal methods. The most closely related one is the CSP failure/divergences-refinement as we have seen in the earlier section. Compared to the RR-cycle approach in the Isabelle Infrastructure framework, CSP offers a range of standardized constructors providing a calculus for event-based specification refinements. Clearly, CSP has thus a constructive advantage but as has already been discussed in Section III-C, Isabelle is expressive enough to emulate those constructors.

In general in CSP, the refinement is focused on events whereas in data-oriented specification formalism, for example, the B-method [44, 13], Z and its object-oriented exension Object-Z [45], refinement means data refinement, that is, is represented as a relation between abstract and concrete datatypes. In early works, automated techniques have already been used to derive a concrete specification from an abstract one provided the concrete data types and refinement relation are given [46, 47, 48]

. To find a refinement, that is, design the refined specification and define the refinement relation is a creative and ambigious process; it reflects the development skill of a programmer. An approach to derive refinement relations and refined specification for Object-Z specifications based on heuristics is

[49]. By contrast to these earlier work, we make a substantial leap forward as we have a constructive way of finding a next refinement step guiding this process by attack tree analysis. The resulting RR-cycle thus provides a systematic way to construct refinements.

Statecharts [50] are a formal method focused on graphically modeling system states and their state transitions and allowing a structured view on the state and its data. Statecharts thus also combine data in states as well as dynamic behaviour; their semantics resembles our model. The transitions may be annotated with events that trigger a transition, conditions which guard it, and actions that are executed when the transition “fires”. Statecharts may contain data and this data has various layers of abstraction but this is not a refinement rather a view of the same system at different levels of granularity. Concerning verification, an embedding of Extended Hierarchical Automata in Isabelle/HOL [51, 52] is related to our work but the focus is on finding property preserving abstractions of the data within the Statechart [53] to enable model checking. Even though model checking is also involved there, abstraction is precisely the opposite of refinement. The relationship of Statecharts with CSP and model checking is also investigated [54] albeit in a dedicated extension of the FDR model checker thus not admitting explicit expression of data refinement and reasoning on system property preserving specification refinement and general security properties as we offer here.

### Viii-C Dependability and Security Verification

In software engineering, dependability aggregates the security attributes confidentiality, integrity, and availability with safety, reliability, and maintainability [1]. In addition to security, we explicitly address safety through formal methods and verification with temporal logic since we can explicitly formalize relevant system properties and prove them on the system specification. However, maintainability (ability of easy maintenance (repair)) and reliability (continuity of correct service) are provided only indirectly because specification and formal proof imply higher quality.

Security verification is quite advanced when it comes to the automated analysis of security (or authentication) protocols. In terms of verification of security using process calculi, the Pi-calculus offers the dedicated model checker Proverif [55]. This work is relevant for our approach since it mechanizes a process calculus with actor-based communication and provides model checking verification but it does not address the idea of refinement let alone providing systematic support for dependability engineering. Security protocols have also been formalized in Isabelle [33] and a similar approach has been shown to integrate well with the Isabelle Insider framework [31].

## Ix Conclusions

In this paper we have presented a formalization of the Refinement-Risk cycle (RR-cycle) within the Isabelle Infrastructure framework which encompasses a notion of formal refinement with a property preservation theory as well as attack trees based on Kripke structures and CTL. The RR-cycle iterates the activities of refinement and attack tree analysis until the termination condition is reached which shows that the global policy is true in the refined model. The application of this formalization of the RR-cycle is illustrated on the case study of the Corona-virus Warning App (CWA). To contrast the development of the RR-cycle we present an account of related refinement techniques in other formal methods. The comparison helps to make some of the used concepts clearer for a larger audience as well as showing up avenues for future work, for example, emulating some of the ready-made refinement constructors available in process calculi like CSP in the Isabelle Infrastructure framework.

The real novelty of the RR-cycle approach presented in this paper is guiding the dependability refinement process by attack trees. It is the expressivity of Isabelle’s Higher Order logic that permits constructing – within the logic – a theory for refinement and attack trees. Thereby, the actual mechanism used in the framework for the analysis of security applications can be explicitly defined from first principles and its properties can be mathematically proved in the framework. At the same time, this generic meta-theory can be applied to application examples. That is, the defined constructs and proved theorems of the Infrastructure framework can now be instantiated to represent a concrete application example, verify security policies on it, find attacks using attack trees, and define refinement relations to new models and apply the RR-cycle constructs to verify – all this within the same framework and thus with the highest consistency and correctness guarantees due to both.

## References

• [1] A. Avizienis, J. Laprie, B. Randell, and C. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 1, pp. 11–33, 2004.
• [2] J. Jacob, “On the derivation of secure components,” in IEEE Security and Privacy.   IEEE, 1989, pp. 242–247.
• [3] F. Kammüller and B. Lutz, “Modeling and analyzing the corona-virus warning app with the isabelle infrastructure framework,” in 20th International Workshop of Data Privacy Management, DPM’20, ser. LNCS, vol. 12484.   Springer, 2020, co-located with ESORICS’20.
• [4] T. Nipkow, L. C. Paulson, and M. Wenzel, Isabelle/HOL – A Proof Assistant for Higher-Order Logic, ser. LNCS.   Springer-Verlag, 2002, vol. 2283.
• [5] CHIST-ERA, “Success: Secure accessibility for the internet of things,” 2016, http://www.chistera.eu/projects/success.
• [6] F. Kammüller, “Combining secure system design with risk assessment for iot healthcare systems,” in Workshop on Security, Privacy, and Trust in the IoT, SPTIoT’19, colocated with IEEE PerCom.   IEEE, 2019.
• [7] ——, “Attack trees in isabelle,” in 20th International Conference on Information and Communications Security, ICICS2018, ser. LNCS, vol. 11149.   Springer, 2018.
• [8] C. A. R. Hoare, Communicating Sequential Processes.   Prentice Hall, 1985.
• [9] R. Milner, A Calculus of Communicating Systems, ser. Lecture Notes in Computer Science.   Springer, 1980, vol. 92. [Online]. Available: https://doi.org/10.1007/3-540-10235-3
• [10] ——, Communicating and mobile systems - the Pi-calculus.   Cambridge University Press, 1999.
• [11] J. Abrial, “Data semantics,” in Data Base Management, Proceeding of the IFIP Working Conference Data Base Management, Cargèse, Corsica, France, April 1-5, 1974, J. W. Klimbie and K. L. Koffeman, Eds.   North-Holland, 1974, pp. 1–60.
• [12] J. M. Spivey, The Z notation - a reference manual, ser. Prentice Hall International Series in Computer Science.   Prentice Hall, 1989.
• [13] J. Abrial, The B-book - assigning programs to meanings.   Cambridge University Press, 1996. [Online]. Available: https://doi.org/10.1017/CBO9780511624162
• [14] A. W. Roscoe, The Theory and Practice of Concurrency, ser. Prentice Hall Series in Computer Science.   Prentice Hall, 1998.
• [15] The Corona-Warn-App Project, 2020, https://github.com/corona-warn-app.
• [16] D. Bundesregierung, “Die Corona-Warn-App: Unterstützt uns im Kampf gegen Corona,” 2020, german government announcement and support of Coronavirus warning app, https://www.bundesregierung.de/breg-de/themen/corona-warn-app.
• [17] The Corona-Warn-App Project, “Corona-warn-app solution architecture,” 2020, https://github.com/corona-warn-app/cwa-documentation/blob/master/solution_architecture.md.
• [18] The DP-3T Project, “Decentralized Privacy-Preserving Proximity Tracing,” 2020, https://github.com/DP-3T.
• [20] The PEPP-PT Project, “Pan-European Privacy-Preserving Proximity Tracing,” 2020, https://github.com/PEPP-PT.
• [21] The ROBERT Project, “ROBust and privacy-presERving proximity Tracing protocol,” 2020, https://github.com/ROBERT-proximity-tracing.
• [22] S. Vaudenay, “Analysis of DP3T: Between Scylla and Charybdis,” 2020, https://eprint.iacr.org/2020/399.pdf.
• [23] The DP-3T Project, “Response to ’Analysis of DP3T: Between Scylla and Charybdis’,” 2020, https://github.com/DP-3T/documents/blob/master/Security analysis/Response to ’Analysis of DP3T’.pdf.
• [24] ——, “Privacy and security risk evaluation of digital proximity tracing systems,” 2020, https://github.com/DP-3T/documents/blob/master/Security analysis/Privacy and Security Attacks on Digital Proximity Tracing Systems.pdf.
• [25] F. Kammüller, “Isabelle infrastructure framework for ibc,” 2020, isabelle sources for IBC formalisation. [Online]. Available: https://github.com/flokam/IsabelleSC
• [26] The DP-3T Project, “Decentralized privacy-preserving proximity tracing - White Paper,” 2020, https://github.com/DP-3T/documents/blob/master/DP3T White Paper.pdf.
• [27] F. Kammüller and C. W. Probst, “Invalidating policies using structural information,” in IEEE Security and Privacy Workshops, Workshop on Research in Insider Threats, WRIT’13, 2013.
• [28] ——, “Combining generated data models with formal invalidation for insider threat analysis,” in IEEE Security and Privacy Workshops, Workshop on Research in Insider Threats, WRIT’14, 2014.
• [29] ——, “Modeling and verification of insider threats using logical analysis,” IEEE Systems Journal, Special issue on Insider Threats to Information Security, Digital Espionage, and Counter Intelligence, vol. 11, no. 2, pp. 534–545, 2017. [Online]. Available: http://dx.doi.org/10.1109/JSYST.2015.2453215
• [30] D. M. Cappelli, A. P. Moore, and R. F. Trzeciak, The CERT Guide to Insider Threats: How to Prevent, Detect, and Respond to Information Technology Crimes (Theft, Sabotage, Fraud), 1st ed., ser. SEI Series in Software Engineering.   Addison-Wesley Professional, Feb. 2012. [Online]. Available: http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0321812573
• [31] F. Kammüller, M. Kerber, and C. Probst, “Towards formal analysis of insider threats for auctions,” in 8th ACM CCS International Workshop on Managing Insider Security Threats, MIST’16.   ACM, 2016.
• [32] ——, “Insider threats for auctions: Formal modeling, proof, and certified code,” Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA), vol. 8, no. 1, 2017. [Online]. Available: http://doi.org/10.22667/JOWUA.2017.03.31.044
• [33] L. C. Paulson, “The inductive approach to verifying cryptographic protocols,” Journal of Computer Security, vol. 6, no. 1-2, pp. 85–128, 1998.
• [34] F. Kammüller and M. Kerber, “Investigating airplane safety and security against insider threats using logical modeling,” in IEEE Security and Privacy Workshops, Workshop on Research in Insider Threats, WRIT’16.   IEEE, 2016.
• [35] ——, “Applying the isabelle insider framework to airplane security,” Science of Computer Programming, vol. 206, 2021. [Online]. Available: https://doi.org/10.1016/j.scico.2021.102623
• [36] F. Kammüller, “A proof calculus for attack trees,” in Data Privacy Management, DPM’17, 12th Int. Workshop, ser. LNCS, vol. 10436.   Springer, 2017, co-located with ESORICS’17.
• [37] ——, “Human centric security and privacy for the iot using formal techniques,” in 3d International Conference on Human Factors in Cybersecurity, ser. Advances in Intelligent Systems and Computing, vol. 593.   Springer, 2017, pp. 106–116, affiliated with AHFE’2017.
• [38] ——, “Formal models of human factors for security and privacy,” in 5th International Conference on Human Aspects of Security, Privacy and Trust, HCII-HAS 2017, ser. LNCS, vol. 10292.   Springer, 2017, pp. 339–352, affiliated with HCII 2017.
• [39] ——, “Formal modeling and analysis of data protection for gdpr compliance of iot healthcare systems,” in IEEE Systems, Man and Cybernetics, SMC2018.   IEEE, 2018.
• [40] ——, “Qkd in isabelle – bayesian calculation,” arXiv, vol. cs.CR, 2019. [Online]. Available: https://arxiv.org/abs/1905.00325
• [41] ——, “Attack trees in isabelle extended with probabilities for quantum cryptography,” Computer & Security, vol. 87, 2019. [Online]. Available: //doi.org/10.1016/j.cose.2019.101572
• [42] ——, “Formalizing probabilistic quantum security protocols in the isabelle infrastructure framework,” informal Presentation at Computability in Europe, CiE 2019. [Online]. Available: https://www.aemea.org/CIE2019/CIE_2019_Abstracts.pdf#page=35
• [43] F. Kammüller, “A formal development cycle for security engineering in isabelle,” 2020.
• [44] J. Abrial, “The B tool (abstract),” in VDM ’88, VDM - The Way Ahead, 2nd VDM-Europe Symposium, Dublin, Ireland, September 11-16, 1988, Proceedings, ser. Lecture Notes in Computer Science, R. E. Bloomfield, L. S. Marshall, and R. B. Jones, Eds., vol. 328.   Springer, 1988, pp. 86–87. [Online]. Available: https://doi.org/10.1007/3-540-50214-9_8
• [45] G. Smith, The Object-Z Specification Language.   Kluwer Academic Publishers, 2000.
• [46] M. J. Butler and T. Långbacka, “Program derivation using the refinement calculator,” in Theorem Proving in Higher Order Logics, TPHOLs’96, ser. LNCS, J. von Wright, J. Grundy, and J. Harrison, Eds., vol. 1125.   Springer, 1996, pp. 93–108.
• [47] C. C. Morgan and P. H. B. Gardiner, “Data refinement by calculation,” Acta Informatica, vol. 27, no. 6, pp. 481–503, 1989.
• [48] R. J. R. Back, “A calculus of refinements for program derivation,” Acta Informatica, vol. 25, no. 6, pp. 593–624, 1988.
• [49] F. Kammüller and J. W. Sanders, “Heuristics for refinement relations,” in Second Interational Conference of Software Engineering and Formal Methods, SEFM’04.   IEEE, 2004.
• [50] D. Harel, “Statecharts: A visual formalism for complex systems,” Science of Computer Programming, vol. 8, pp. 231–274, 1987.
• [51] S. Helke and F. Kammüller, “Representing hierarchical automata in interactive theorem provers,” in Theorem Proving in Higher Order Logics, 14th International Conference, TPHOLs 2001, Edinburgh, Scotland, UK, September 3-6, 2001, Proceedings, ser. Lecture Notes in Computer Science, R. J. Boulton and P. B. Jackson, Eds., vol. 2152.   Springer, 2001, pp. 233–248. [Online]. Available: https://doi.org/10.1007/3-540-44755-5_17
• [52] ——, “Formalizing statecharts using hierarchical automata,” Arch. Formal Proofs, vol. 2010, 2010. [Online]. Available: https://www.isa-afp.org/entries/Statecharts.shtml
• [53] ——, “Structure preserving data abstractions for statecharts,” in Formal Techniques for Networked and Distributed Systems - FORTE 2005, 25th IFIP WG 6.1 International Conference, Taipei, Taiwan, October 2-5, 2005, Proceedings, ser. Lecture Notes in Computer Science, F. Wang, Ed., vol. 3731.   Springer, 2005, pp. 305–319. [Online]. Available: https://doi.org/10.1007/11562436_23
• [54] B. Roscoe and Z. Wu, “Verifying statemate statecharts using CSP and FDR,” in Proceedings of International Conference on Formal Engineering Methods (ICFEM), ser. LNCS, Z. Liu and J. He, Eds., vol. 4260.   Springer-Verlag, 2006.
• [55] B. Blanchet, “Automatic verification of security protocols in the symbolic model: the verifier ProVerif,” in Foundations of Security Analysis and Design VII, FOSAD Tutorial Lectures, ser. Lecture Notes on Computer Science, A. Aldini, J. Lopez, and F. Martinelli, Eds.   Springer Verlag, 2014, vol. 8604, pp. 54–87.