Cloud computing has attracted interests in both the scientific and the industrial computing communities due to its ability to provide flexible, configurable, and cost-effective computing resources delivered over the internet. In cloud systems, computing resources are shared among multiple clients. This is achieved by virtualisation in which a collection of virtual machines (VMs) are running upon the same platform under the management of the hypervisor. Virtualisation allows computing service providers to maximise the utilisation of the devices and minimise the costs by creating multiple VMs over a shared physical infrastructure. However, such services can also bring additional channel threat, and introduce information leakage between unrelated entities during the procedure of resource sharing and communications through unintended covert channels. To address this concern, this paper proposes to develop formal approaches to specifying, modelling and analysing flow security properties in virtualised computing networks, which are the key underpin of contemporary society such as cloud computing.
Specifically, information leakage can come from observations on both program executions and cache usage in the virtualised environment. On the one hand, for processes running upon a particular VM instance, consider the processes are communication channels, sensitive inputs can be partially induced by observing public outputs of the processes regarding choices of public inputs. On the other hand, shared caches enable competing VM instances to extract sensitive information from each other. CPU cache, a small section of memory built in the CPU for fast memory access, is one of the highest-rate measurable resources shared by multiple processes [WuXW12]. Therefore cache-based side-channel attack becomes one of the major attack on VMs and receives the most attention in the cloud environment [RistenpartTSS09]. Consider a malicious VM repeatedly access shared cache memory to perform cache-based side-channel attacks. Such attacks allow one virtual machine to effectively steal secrets of another hosted machine in the same cloud environment. More precisely, for processes running upon different VM instances, cache usage can be considered as a communication channel. Consider the cache lines accessed by the victim instance and by the malicious instance as high level input and low level input respectively, by observing the usage (such as time) of victim cache lines (low output), the malicious observer can learn some information of the victim instance. In summary, this paper considers distributed virtualised computing environment where the attacker VM steals the information from the target one by observing executions of victim processes, and by probing and measuring the usage (timing) of the shared cache.
In particular, we develop an approach from the software language-based level to enforce the applications to access shared cache and to bring interferences in a predictable way. As a result, we aim to prevent the leakage introduced by such cache timing channel and the interference between security objects caused by executions. First, we propose a CSP-like language for modelling communicating processes running upon VMs with mobility in the computing environment. Second, we formalise a cache flow policy to specify the security condition regarding the threat model we focus on. Finally, we describe a type system of the language to enforce the flow policy and control the leakage introduced by observing the system behaviours. More specifically, we give an identifying label to each VM instance, and partition the cache into a set of page-sized blocks, each block in pages is mapped to a set of cache lines and is dynamically granted the label of its owner (the VM instance it is allocated to). The label of the block will be revoked when its owner terminates. In addition, programming variables and cache lines are assigned security labels to denote the security levels of the data they store. VM instances and hosts are assigned to different categories for the purpose of controlling information flow introduced by processes and VM instances movements. When a guest VM is launched, the VM manager allocates cache pages to it regarding its requests. During the running time of the VM process, operation of cache lines is allowed only if it satisfies the specified flow policy. Processes are allowed to move from one VM to another, and VM instances are allowed to move from one host to another. However, movements from a lower-ordered category carrier to a higher one are not allowed. Furthermore, we only allow the communicating process access the cache within a certain time in order to prevent timing leaks by observing the usage of the cache. These regulations are enforced by the semantics and the typing rules we formalised. When the guest VM terminates, the relevant cache pages are initialised, and released to the VM manager to prevent flushing of the pages by a malicious VM.
This paper is organised as follows. Section 2 describes our language for modelling basic virualised computing networks. Section 3 describes the threat model we focus on. Section 4 specifies cache flow policy which the processes should satisfy in our model. Section 5 presents a type system to enforce the cache flow policy. Section 6 briefly reviews literature in the related areas.
2 The Modelling Language Csp
This section presents a dialect of communicating sequential processes (CSP) [Hoare78] language CSP for formal modelling of and reasoning about virtualised computing network systems considered in this paper. Such an environment can be considered as a distribute computing system, in which a group of inter-connected and virtualised computers are dynamically allocated for serving. Processes can move from one VM to another and communicate to each other via sending/receiving messages. Applications and data are stored and processed in the network but can be accessed from any location using a client. It is natural to specify and describe the system as a set of communicating processes in a network with consideration of resource sharing in a predictable way. A CSP-like language is therefore a good choice for modelling of such systems.
2.1 Terminology and notation
We consider the infrastructure consists of a set of virtual private networks (VPNs) upon which a set of virtual machines (VMs) can communicate with each other. A VPN may include one or more VMs, and the location of VMs can be viewed as a node (host) of the VPN it belongs to. An instance is a VM upon which a number of processes locate and run. Let denote a set of instances, . Processes () can be constructed from a set of atomic actions called events (), or be composed using operators to create more complex behaviours. The full set of actions that a system may perform is called the alphabet (). The operators are required to obey algebraic laws which can be used for formal reasoning. The interactions carrying data values between processes take place through “channels”. From an information theory point of view, a storage device such as cache which can be received from (reading) and sent to (writing) is also a kind of communication channel. CPU data caches locate between the processor cores and the main memory. We assume the clients (including the attacker) know the map between memory locations and cache sets, so we omit the details of the mapping and focus on cache organisation and operation here. Cache can be viewed as a set of cache lines: where denotes the size of the cache.
In addition, in order to encode the desired features of the language for flow secure virtualised computing systems, we assign security labels into variables and cache lines (channels), and allocate cache lines into instances via mappings:
where denotes a security lattice. We write in stead and in general for cases without introducing any confusion. Furthermore, we consider VM hosts are assigned to different categories , with an ordering of subset relations:
For instance, is assigned to category: , is assigned to category: , and thus since . Similarly, we also assign VM instances into different sub-categories , with an ordering of subset relations:
For instance, running upon VM belonging to sub-category: , running upon VM belonging to sub-category: , and , so .
Table 1 presents the syntax of the language CSP .
Expressions can be variables, channels, integers and arithmetic operations (denoted by ) between expressions.
Operator assigns the value of to process variable . Action operator denotes the inactive processes that does nothing and indicates a failure to terminate, and delayable operator allows the process to do nothing and wait for time units. Moving operator allows a process to be able to move from current VM to another one . We require that VM processes running upon on a VM instance with lower (category) order are not allowed to move to a VM instance with a higher (category) order. Operator denotes the sequential composition of processes and . Branch operator defines if the boolean expression is true then behaves like otherwise behaves like . Sending operator will output a value in expression over channel to an agent, and receiving operator allows us to input a data value during an interaction over channel and write it into variable of an agent. denotes the synchronous parallel composition of processes and . Loop operator denotes the loop operation of process while is true.
An instance is a virtual machine (VM) hosted on a network infrastructure. Operator defines the VM upon which process runs. Operator allows VM instances (with all processes running on it) to migrate from current host machine to another host to keep the instance running even when an event, such as infrastructure upgrade or hardware failure, occurs. Similarly to the process movement, we require that VM instances running upon on a host with lower (category) order are not allowed to move to a host with a higher (category) order. So, in the previous example, instances running upon are not allowed to moved to , and is not allowed to move to VM . Operator defines the host machine upon which runs, denotes the cache pages allocated to instance . To ensure that no cache is shared among different VM instances, we require that for any host upon which any and are running: , and if an instance terminates, then set to be for future allocation to other instances. Virtual private network provides connectivity for VM hosts. It can be viewed as a virtual network consisting of a set of hosts where VM instances can run and communicate with each other. The location of a host indicates a network node of G in which the VM host machine locates.
2.3 Operational semantics
In order to incorporate as much parallel executions of events within different nodes as possible, we transform the network into a finite parallel compositions of the form:
where denotes the identifier of a host, denotes the VM instance located at . Each component is considered as a decomposition of . We argue such decomposition is well-defined by applying the following rules of structural equivalence:
We now define the operational semantics of CSP in terms of multiset labelled transition system , where:
is a vector of configurations of a VPNregarding the vector of decomposition of . A configuration , regarding a single component (say process ) of the decompositions of , is defined as a tuple , where:
denotes the store;
defines the possible world regarding cache;
specifies the owner (the VM instance identifier) of process ;
specifies the host (location) of the VM instance of process .
is a set of operating events which the processes can perform;
is the multiset transition relations: denotes a vector of operating events for the vector of components.
The action rules of the operational semantics of CSP is presented in Table 2.
Notations , , denote cache addressing, cache allocation, and evaluation respectively. For instance, means that under configuration , evaluates to value , means the cache allocated for expression locates at , and means cache address of channel is , notation is used to denote the cache addresses allocated for .
Store is defined as a mapping from variables to values, i.e., . Cache is considered as a mapping from addresses (of cache lines) to integers (cached) or (flushed), i.e., .
Action rule of assignment updates the configuration such that the state of is the value of expression after the execution. Action rule of process moving operator updates the configuration such that the identifier of instance accommodating turns to be after the execution of the process movement. Similar action rule is applied for VM instance movement from one host to another. Action rule of sending operator updates the configuration such that the value stored in cache address () is , if expression evaluates to under configuration before the execution and the address of cache allocated for communicating channel (to store expression ) is . Rule of receiving operator updates the configuration such that the state of variable is , if the value stored in cache address of the communicating channel is under the configuration before receiving the data. In the cross-VM communications over today’s common virtualised platforms, the cache transmission scheme requires the sender and receiver could only communicate by interleaving their executions for security concerns. In order to capture timing behaviour of cache-related operations, we consider the cache-related operations, such as communicating, as time-sensitive behaviours whose lasting time is recorded. The events are therefore considered as either non-delayable (time-insensitive, time does not progress) actions or delayable (time-sensitive, involves passing of time ) behaviour. We use to denote the time duration of event lasting for.
The behaviour of a process component can now be viewed as a set of sequences of timed runs.
Definition 1 (Timed run)
A timed run of a component of is a sequence of timed configuration event pairs leading to a final configuration:
where, and denote the initial and final configuration respectively. , is an event will take place with passing time under configuration ; if , is considered as an immediate event, if , is considered as a time-sensitive event with lasting time .
3 Attacker Model
We consider computing environment where malicious tenants can use observations on process executions and on the usage of shared cache to induce information about victim tenants. We assume the service provider and the applications running on the victim’s VM are trusted. Let us consider the attacker owns a VM and runs a program on the system, and the victim is a co-resident VM that shares the host machine with the attacker VM. In particular, there are two ways in which an attacker may learn secrets from a victim process: by probing the caches set and measuring the time to access the cache line (through the cache timing channel), and by observing how its own executions are influenced by the executions of victim processes.
Cache timing side channel (C1).
In the virtualised computing environment, different VMs launch on the same CPU core, CPU cache can be shared between the malicious VM and the target ones. Consider the malicious VM program repeatedly accesses and monitors the shared cache to learn information about the sensitive input of the victim VM. Specifically, timings can be observed from caches and are the most common side channels through which the attacker can induce the sensitive information of the victim VM. Intuitively, such cache side channel can be viewed as a communication channel, the victim and the attacker can be viewed as the sender and the receiver respectively. We assume that the attacker knows the map between memory locations and cache sets, and is able to perform repeated measurements to record when the victim process accesses the target cache line. We focus on cache timing side channel, all other side-channels are outside our scope.
Consider the scenario presented in Fig. 1, where (victim VM, labelled ) and (attacker VM, labelled ) be two instances running upon (labelled ); victim processes and , running over , are communicating to each other: generates a key and sends it to , encrypts a message using the received key and sends the encrypted message to , receives the message and decrypts it; and attacker process , running over , keeps probing cache address of the key.
Let , and denote the function of generating a key, encryption and decryption respectively, and assume is a function probes cache address CAddr: returns 1 if it is available and returns otherwise. We present the model in our language as follows.
The communication time between and is affected by the value of the key generated. There is information flow from victim VM to malicious VM through cache side channel.
Leakage through observations on process executions (C2).
Next let us focus on processes running upon a particular VM instance. Consider the VM instance, upon which processes are running, as a communication channel having inputs and outputs, which is relative to information flow from victim user to a malicious one. The victim user controls a set of higher level inputs with confidential data. The attacker is an observer who may have control of lower level inputs. He has partial observation on executions of the process, but does not have any access to the confidential data. Specifically, the weak attacker can observe the public result, i.e., the final public output of the programs, while the strong attacker can observe the low state after each execution step of the processes. We consider a process has access to confidential data via higher level inputs. The attacker tries to collect and deduce some of the secure information about higher level inputs by varying his inputs and observing the execution of the process.
Consider process and are running upon a VM instance . Process inputs a password through channel into -level variable , and updates -level variable to be if
is odd and to beif is even. Process output variable through channel :
Assume . Clearly there are implicit flows from to by observing -level output of the process.
4 Information Flow Policy
Information flow is controlled by means of security labels and flow policy integrated in the language. Each of the identifiers, information container, is associated with a security label. Identifiers can refer to variables, communication channels, and can refer to entities such as files, devices in a concrete level. The set of the security labels forms a security lattice regarding their ordering. We study the system flow policy which prevents information flow leakage from high-level objects to lower levels and from a target instance to a malicious one via observing process executions and cache usage (by measuring the time of accessing cache lines during communications).
In general, information flow policies are proposed to ensure that secret information does not influence publicly observable information. An ideal flow policy called Non-interference (NI) [GoguenMeseguer82] is a guarantee that no information about the sensitive inputs can be obtained by observing a program’s public outputs, for any choice of its public inputs. Intuitively, the NI policy requires that low security users should not be aware of the activity of high security users and thus not be able to deduce any information about the behaviours of the high users. On the one hand, for processes running upon a particular VM instance (regarding C2), the NI policy can be applied to control information flow from high-level input to low-level output, where state of sensitive information container (s.a. high-level variables) and observations on behaviours of public information container (s.a. low-level variables) are viewed as the high input and low output respectively. On the other hand, for processes running upon different VM instances (regarding C1), we adapt the NI policy here in order to control the information flow from processes running upon victim instance to malicious one through cache side channel. Consider the cache side channel as a communication channel, the cache lines accessed by the victim instance and by the malicious instance are viewed as high level input and low level input respectively, and the observations on the victim cache usage (s.a. hits/misses) are considered as low level outputs. Cache flow non-interference demands the changing of the cache lines accessed by the victim process (high inputs) does not affect the public observations on the cache usage (low outputs). Informally, cache flow interference happens if the usage (we focus on the accessing time) of the cache lines accessed by one victim process affects the usage of the cache lines accessed by attacker VM processes. Fig. 2 presents some intuition of the cache NI policy discussed above.
Formally, the policy of flow non-interference can be considered in terms of the equivalence relations on the system behaviours from the observer’s view, including the state evolution of information container and the timing behaviour of cache accessing. This is due to the fact that the system behaviours are modelled as timed runs with security classification of identifiers and with timing considerations when accessing caches during the process communications in our model.
Definition 2 (Flow security environment)
Let be a finite flow lattice, denote the ordering relation of , denote the set of VM instances running upon any host , and denote a set of categories for hosts and sub-categories for instances respectively. The flow security environment is considered as:
where , , , , . Furthermore, we say iff , , and :
and for , , and , we say iff:
where we abuse notation , , and to denote , , and in environment respectively.
Definition 3 (-equivalent configuration )
Consider processes running upon hosts of VPN , let be a security environment, be a security level, be a category and be a sub-category. For any , , assume and are the instance and host which belong to, we define store -equivalence under as follows: iff:
and cache line -equivalence under as follows: iff:
Furthermore, given two configurations and , we say iff:
Definition 4 (Strong bisimulation and weak bisimulation )
Consider two timed runs running upon host under security environment :
such that , we say and are strong -bisimilar to each other, i.e., , iff:
and say and are weak -bisimilar to each other, i.e., , iff:
Definition 5 (Cache flow security policy)
Given a security level , , and a VPN under security environment is considered strong cache flow secure iff:
where and denote the initial configuration of and respectively, denotes all runs of components of . Similarly, the definition of weak cache flow secure can be given.
Consider the model presented in Example 1. Let , , , and . Let us assume , and . Communicating cache channels are thus assigned with security labels regarding the data they transmit: , . Note that the state of variable depends on the state of cache address of , and is affected by the communication time for data transmission between and . Therefore the model does not satisfy the cache flow security policy since for any given two runs, both the timing condition and configuration equivalent condition of are not guaranteed to be satisfied.
In order to close the cache timing channel, we consider the communication as a scenario of sending and receiving processes running in parallel with certain time interleaving data transmission scheme:
The communicating procedure needs to complete in time units (together with sleeping time) and then behaves as ; the value of is sent to variable via channel , channel and the relevant cache lines are then tagged as . The communication will be considered as failed if time units have passed but the communication has not completed yet. Fixed completion time prevents the timing leakage introduced by the cache channel communication.
5 Flow Security Type System for Csp
In order to make the low observation and cache accessing time of the executions be high input independent, the variables and cache lines are associated with security labels, VMs and hosts are assigned to categories, rules (semantic + typing) are required to ensure that: no information flows to lower level objects, no cache is shared among different VM instances, processes (c.f. instances) are not allowed to move from a lower order instance (c.f. host) to a higher one, and cache related operations in communications between processes are forced to be completed in certain time, and hence the cache flow policy is enforced.
For a process (w.r.t. a component of ), we consider the type judgements have the form of:
where the type denotes the (environment) counter security levels of the communication channel/variables and and counter categories of VMs/hosts participated in the branch events being executed for the purpose of eliminating implicit flows from the guard. and describe the type environment which hold before and after the execution of . In general, notation:
describes that under type environment , expression and (the address of) cache line has type and respectively, is allocated to VM instance , instance is assigned to a category , and host is assigned to a category . The type of an expression including boolean expression is defined by taking the least upper bound of the types of its free variables as standard:
All memory, caches and channels written by a t-level expression becomes tagged as t-level. Let and denote a set of variables defined in and cache lines allocated to process . Typing rules for processes with security configuration are presented in Table 3.