A Game-Theoretic Framework for the Virtual Machines Migration Timing Problem

03/14/2018
by   Ahmed H. Anwar, et al.
0

In a multi-tenant cloud, a number of Virtual Machines (VMs) are collocated on the same physical machine to optimize performance, power consumption and maximize profit. This, however, increases the risk of a malicious VM performing side-channel attacks and leaking sensitive information from neighboring VMs. To this end, this paper develops and analyzes a game-theoretic framework for the VM migration timing problem in which the cloud provider decides when to migrate a VM to a different physical machine to reduce the risk of being compromised by a collocated malicious VM. The adversary decides the rate at which she launches new VMs to collocate with the victim VMs. Our formulation captures a data leakage model in which the cost incurred by the cloud provider depends on the duration of collocation with malicious VMs. It also captures costs incurred by the adversary in launching new VMs and by the defender in migrating VMs. We establish sufficient conditions for the existence of Nash equilibria for general cost functions, as well as for specific instantiations, and characterize the best response for both players. Furthermore, we extend our model to characterize its impact on the attacker's payoff when the cloud utilizes intrusion detection systems that detect side-channel attacks. Our theoretical findings are corroborated with extensive numerical results in various settings.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 14

page 18

07/16/2020

Containers Placement and Migration on Cloud System

Currently, many businesses are using cloud computing to obtain an entire...
04/27/2015

Deterministically Deterring Timing Attacks in Deterland

The massive parallelism and resource sharing embodying today's cloud bus...
10/14/2020

Exploiting Interfaces of Secure Encrypted Virtual Machines

Cloud computing is a convenient model for processing data remotely. Howe...
06/22/2020

Counting Down Thunder: Timing Attacks on Privacy in Payment Channel Networks

The Lightning Network is a scaling solution for Bitcoin that promises to...
02/27/2018

Leakage and Protocol Composition in a Game-Theoretic Perspective

In the inference attacks studied in Quantitative Information Flow (QIF),...
04/22/2010

Performance Evaluation of DCA and SRC on a Single Bot Detection

Malicious users try to compromise systems using new techniques. One of t...
05/30/2019

Optimal Timing of Moving Target Defense: A Stackelberg Game Model

As an effective approach to thwarting advanced attacks, moving target de...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

One of the main characteristics of the Cloud that allows scalable and cost-effective operation is multi-tenancy. Multi-tenancy is achieved through virtualization to enable cloud providers to host multiple virtual machines (VMs) on the same physical machine while providing isolation between them. Recent attacks, however, have been shown to bypass such isolation [1]. A malicious VM collocating on the same physical machine with a victim VM can seek unauthorized access to sensitive and private data and/or intellectual property, or can render some of its computational functionality unusable.

This has prompted cloud providers to develop various strategies for VM placement, migration and reconfiguration to mitigate some of these attacks. Moving target defense (MTD) strategies aim to dynamically shift the attack surface, making it more difficult for attackers to launch potent attacks [2]. When developing an MTD strategy, two main questions generally arise: which targets should be moved? and when should they be moved? The answer to these questions is highly-dependent on the context of the problem and the nature of the attack. For example, if an attacker contemplates inferring the underlying topology of the cloud, then the connectivity between machines is the target that should be changed over time. In a different setting, if the attacker is interested in cracking the system credentials that protect the users’ databases, then the keys are the target that should be constantly reconfigured (i.e., moved). In this paper, we consider collocation attacks whereby an attacker can leak sensitive data from a targeted victim by running a VM on the same physical node (e.g., through launching a side-channel attack). Thus, for securing such system, VMs should be periodically migrated (i.e., moved to a different physical machine). This paper is primarily focused on the second question, that is, when to move the identified targets.

In the MTD literature, this question is usually referred to as the timing problem of the MTD strategy. In this paper, we study this question using a game-theoretic framework seeking an understanding of the interplay of the actions of both the cloud provider (i.e., the defender) and the adversary. In our formulation, the adversary seeks to prolong the collocation time with the victim VMs to maximize information leakage. Since the adversary has no guarantees to be successfully collocated on the same physical machine with the victim – since different cloud providers implement different placement algorithms according to different criteria that the attacker has no control over – her best-effort would be to increase the number of VMs to launch (which is a cost metric we capture). The adversary can then check after being placed whether she had a successful collocation or not [3]. The cloud provider, on the other hand, seeks to migrate VMs between physical machines to minimize the collocation times between VMs. VM live migration, while efficient, is not free [4] and thus the question so as to when to migrate is crucial in order to mitigate the collocation attack threats while not burdening the system with a large overhead that may not be justified.

Contributions: While VM migration strategies have been proposed as defense mechanisms against collocation attacks in various studies, such work focused on the VM assignment problem (mapping VMs to physical nodes) as a single player scheduling problem. In this paper, however, we consider the timing problem of the MTD as a game between the attacker and the cloud provider. Our work contributes to the theory of timing games [5, 6], which is largely unexplored in cloud computing settings. We leverage the results of the leakage model in the FlipIt game considered previously in [7, 8, 9, 10, 11, 12] to develop a novel formulation to study the VM collocation problem in an extended FlipIt game-theoretic framework. To the best of our knowledge, this is the first work to investigate the following aspects of the timing games.

  • We provide a new game-theoretic formulation for the VM collocation timing problem.

  • Unlike [13, 14, 15], we do not assume the defender has prior knowledge of the exact location of the attacker, thereby allowing for realistic threat and defense models. The defender has to migrate the VMs at the right time(s) to defend against malicious collocating users.

  • We analytically characterize the Nash equilibrium (NE) for the studied game model and derive sufficient existence conditions.

  • We study the behavior of the adversary when the defender adopts an intrusion detection system (IDS). In this case, the adversary not only takes attack actions, but also decides when to stop her attack to reduce the risk of being detected.

  • We provide extensive numerical experiments to support our theoretical findings and compare our proposed defense policies against other defense policies. In our numerical evaluation, we consider several reward functions to reflect the severity of the attack and different degrees of information leakage.

This paper is organized as follows. In Section III, we provide the system model and game formulation. In Section IV, we provide theoretical analysis and establish existence conditions of NE for the formulated game. Our numerical results are presented in Section VI and we conclude the paper in Section VII.

Ii Related work

This work is at the intersection of two areas focused on securing cloud computing: Cross-VM side-channel attacks and mitigation, and the use of game theory in modeling the interplay between the cloud provider and the adversary. In this section we put our work in context within these two areas.

Ii-a Cross-VM side channel attacks and mitigation strategies

Cloud security has received considerable attention recently [1, 16]. Various studies have investigated the impact of cross-VM side-channel attacks [17, 18, 19, 3, 20, 21, 22]. Users cryptographic keys have been shown to be vulnerable to exfiltration attacks when adversaries perform Prime+Probe attacks on the square-and-multiply implementation of GnuPG [20]. The authors in [3, 22, 21] have shown that some side-channel attacks can extract cryptographic keys by exploiting the last-level shared caches of the memory. Other attacks have identified pages that a VM shares with its collocated neighboring VMs revealing information about the victim’s applications [18] and OS [19].

To combat cross-VM side-channel attacks, various approaches have been proposed at the hypervisor [23, 24, 20, 25, 26]), the guest OS [27], the hardware level [28, 29], and the application layer [30]. These techniques, however, suffer from two fundamental limitations. First, they cannot be generalized to different types of side-channel attacks [31]. Second, they require major changes to the hypervisor, OS, hardware, and applications [32]. VM live migration, on the other hand, has been proposed as an effective mechanism to combat side-channel attacks [4, 33]. The authors in [34] provided a detection mechanism known as CloudRadar that works as a real-time side-channel attack detector based on monitoring hardware performance counters. The authors in [35] proposed another detection system that can differentiate between friendly and other malicious activities of neighboring tenants. The authors in [36]

showed that by controlling the placement process, a defense mechanism can mitigate the effect of cross-VM attacks through reducing the co-run probability between users. The approach, however, is only effective in the case of time-sensitive attacks and when the number of assigned virtual CPUs is large. Motivated by the Moving target defense (MTD) concept, the authors in

[37] presented a migration engine in which VMs are migrated to balance the load between different nodes in the cloud. Although MTD is a well-known defense methodology, the authors in [38] demonstrated that in certain scenarios the migrated VMs can be tracked by adversaries. Hence, they proposed a stealthy approach to migrate VMs that can hide them on the network. In [39], the authors study an MTD migration strategy against an attacker solving a multi-armed bandit problem seeking to collocate VMs with high rewards.

Ii-B Cloud Security using Game-Theoretic Techniques

The use of game theory has largely focused on the VM allocation problem in the presence of adversaries [13, 14, 40, 15, 41]. A common assumption in such formulations is that the adversary is known which does not typically hold in practice. Additionally, existing formulations do not consider the timing question for the VM migration problem, which is a critical one for the cloud provider wishing to migrate VMs for security. A more practical leakage model was considered in [42, 43], based on the FlipIt game model. FlipIt is a two-player game in which a defender and an attacker compete over the control of a given resource which can only be held by one player at a time. A flip is an action taken by a player to gain control of the resource. The goal is to hold the resource for as long as possible with the least number of flips (i.e., flips are costly). Over time, the resource generates rewards for the player holding the resource. The state of the resource is obscured form each player until they “flip”. Several variants of the FlipIt game model were considered to study different security situations [7, 8, 44, 45, 9, 10, 11, 12]. In [8], the authors studied different strategies for each player and calculated dominant strategies and Nash equilibria. In [44], the game model was extended under the assumption that the players know the state of the resource before taking actions. In [45, 9] the game was extended to the case of a system where insiders can work in favor of external adversaries. The authors in [10] considered the game with both players having limited budgets. Pawlick et al. investigated the game model with characteristics of signaling games [11]. In [12], Farhang et al. studied a variant of the FlipIt game with an associated data leakage model in which the defender can partially eliminate the foothold of the attacker. The attacker exploits the system vulnerabilities that appear based on a periodic process. The authors assume that the attacker’s strategy is fixed since she always starts to attack right after the defender takes his action. This, however, requires the attacker to fully observe the defender’s strategy which we do not assume here.

In this work, we consider a significantly different and a realistic threat model that captures data leakage due to cross-VM side-channel attacks and develop defense strategies for identifying the best time(s) to migrate VMs. We do this through a game-theoretic framework in which the attacker only controls the attack rate and does not fully observe the defender’s strategy. In addition, we assume that the attacker controls the probability of a successful attack by choosing the attack rate as opposed to the time to launch the attack.

Iii System Model

Iii-a The cloud

Fig. 1: System model illustration

We model the cloud as a set of physical machines whereby each machine can host a number of VMs from different users. The cloud provider uses a placement strategy to initially assign VMs to physical machines. The details of the placement strategy do not affect our analysis and we assume that the adversary (or any user) has no control over it. We assume the adversary is interested in targeting a set of victim VMs by collocating with them on the same physical machine. We study the interaction between the cloud provider (defender) and the adversary through a game-theoretic framework in which the rewards are time-dependent. In particular, the defender’s strategy is to choose the time to re-assign VMs to different machines to defend against collocation attacks. The adversary, on the other hand, chooses an attack rate to launch more VMs to increase her collocation duration to maximize information leakage from her victims as described in Fig. 1. We define the game next.

Iii-B The Game

A game is defined as a tuple , where

  • is the set of players. Here, , denoting the defender (player 1) and the adversary (player 2).

  • is the action space for the defender and adversary.

  • is the reward function, .

Iii-B1 Defender’s action space

Since we are investigating the timing factor, the cloud provider (referred to as the system defender) is assumed to control the re-allocation period. Let denote the time instant at which the defender migrates a running VM to a new physical node, such that , where is a system parameter at which the credentials are reset and the smallest reconfiguration time. Since we assume a leakage model, at time when the system credentials are reset, the attacker can no longer benefit from the side-channel attack. Therefore, the whole game will be reset every . The defender seeks to optimize the value of to minimize chances for information leakage and avoid loading the system with unnecessary migrations. Thus, the defender’s goal is to optimize the tradeoff between security and stability. In particular, a smaller ensures the system is more secure since the co-residency times between any two VMs will be small. However, the system’s overhead increases due to the frequent migration of the VMs between the physical nodes. On the other hand, a larger leads to a more stable system. However, the co-residency times between VMs on the same node will be large making the system more susceptible to information leakage through collocation attacks.

Iii-B2 Attacker’s action space

Here, we assume that the attacker does not know the system placement algorithms, hence only tries to increase her co-residency chances via increasing the number of requests submitted to the cloud provider. Let denote the rate of requests (rate of attack) submitted to the cloud, where is an interval of non-negative attack rates. The game is assumed to start at time , and let denote the actual time at which the attacker successfully collocates with her targeted victim. Hence,

is a non-negative random variable with a probability density function (pdf)

parametrized by . Since the attacker pays a cost for each submitted job, she needs to optimize over the attack rate . Hence, the attacker’s tradeoff can be summarized as follows. When is very small, it is less probable for the attacker to successfully co-reside with her victim and in turn leak any information before it is migrated. When is very large, the attacker increases her chances of successful collocation at the expense of a higher attack cost. Therefore, the pdf should be such that yields a higher probability of early collocation than , when . Mathematically, this requirement is expressed in the following assumption.

Assumption 1.

for , where

denotes the cumulative distribution function (CDF) of the collocation time.

If , then the attacker can choose to back off (i.e., not attack). In such case, is a degenerate deterministic distribution such that since the probability of collocation is 0. Since the game ends at then repeated, we consider the reward per unit time. We focus only on the timing factor of the problem, and the mapping of VMs to physical nodes is carried out through the placement engine. Any newly arriving VM or existing VMs that are being migrated can be passed to the placement engine for allocation, hence no system hardware modification is required. Next, we define the players’ reward (payoff) functions. We assume a nonzero-sum two-person game.

Iii-B3 Attacker’s reward

Once the attacker is successfully placed on the same node where the victim VM resides, she immediately starts accumulating rewards by leaking information. Let denote the reward accumulated by the attacker.

Assumption 2.

is a stationary function and monotonically non-decreasing in the collocation duration , where .

Stationarity signifies that the attacker’s accumulated reward depends on the collocation and migration times only through their difference, i.e., the duration of collocation. The accumulated reward is assumed to be zero if . The attacker incurs a cost for launching this attack. Hence, the total cost is scaled by the rate of attack . Therefore, the attacker’s payoff is given by

(1)

where is an indicator function, and the tilde notation signifies the payoff for a given realization of , which is a random variable. Hence, the expected payoff is

(2)

Iii-B4 Defender’s reward

The defender, on the other hand, incurs a loss due to the collocation of a victim VM with the attacker equal in magnitude to the gain of the attacker. In addition, the defender pays a cost per migration, which increases the system overhead and overloads the placement engine. The cost of migration is denoted by . Accordingly, the defender’s payoff can be written as

(3)

Averaging over , the expected payoff for the defender can be calculated as

(4)

The probability of successful collocation (i.e, ) is computed as

(5)

Iv Theoretical Analysis

In this section, we derive sufficient conditions for the existence of Nash equilibria for the formulated game. Existence of Nash equilibria depends on the properties of the payoff functions. First, we derive existence conditions for a general accumulated reward function and pdf of the collocation time , then we provide analysis for a specific instantiation of the payoff functions. We also characterize the best response curves for both players and derive conditions for Nash equilibrium strategies if they exist. First, we restate a general theorem from [46] that provides sufficient conditions for -person nonzero-sum games to admit a pure strategy Nash equilibrium.

Theorem 1.

[46] For each player in the set of players, let the action space of player be a closed, bounded and convex subset of a finite-dimensional Euclidean space, and the cost functional be jointly continuous in all its arguments and strictly convex in , for every . Then, the associated -person nonzero-sum game admits a Nash equilibrium in pure strategy.

Iv-a General reward functions

For the general payoff formulation described in equations (2) and (4), the following lemma proved in the appendix establishes sufficient conditions for the concavity of the payoff functions.

Lemma 2.

For the 2-person nonzero-sum game defined in Section III-B with payoff functions defined in equations (2) and (4) under Assumptions 1 and 2, if is strictly concave in , then is strictly concave in for any , and if is convex in , then is strictly concave in for any .

Therefore, we can readily state sufficient conditions for our game to admit a pure strategy Nash equilibrium.

Theorem 3.

The 2-person nonzero-sum game defined in Section III-B under Assumptions 1 and 2 with the payoff functions in (2) and (4) admits a Nash equilibrium in pure strategy if is continuous and strictly concave in , and is convex and is continuous in .

The proof of Theorem 3 follows directly from Lemma 2, which establishes strict concavity of the payoff functions under the conditions in the statement of the theorem, and Theorem 1 from [46].

Proposition 4.

For the game defined in Section III-B with , there exists an equilibrium in which the attacker backs off (i.e., does not attack) and the defender does not migrate if the reward function satisfies

(6)

for every , where denotes the expectation w.r.t. the measure induced by .

Proof.

If the attacker backs off, i.e., chooses , then the defender’s payoff in (4) becomes

which attains its maximum at for any . Hence, the defender’s best response is to not migrate over the game interval. Also, if the condition (6) in the statement of Proposition 4 is satisfied, then the attacker’s best response to the defender’s action is . To see that note that if

then,

since is monotonically non-decreasing in per Assumption 2. Recalling the attacker’s payoff function in (2), the attacker’s decision to back off is at least as good as launching an attack at an alternative non-vanishing rate since the cost of the attack upper bounds the leakage reward for any .

Definition 1.

In an N-person nonzero sum game, let be the reward function of player . For each player , assume that the maximum reward of with respect to can be attained for any players’ action profile , where and . Then, the set defined by

is called the optimal (or best) response of player . If is a singleton for every , then it is called the reaction curve [46].

Accordingly, it follows from the definition of a Nash equilibrium (in that no player can gain by a unilateral change of strategy if the strategies of the other players remain unchanged) that the intersection points of the best responses are Nash equilibria. In the following theorem, we characterize the best response for both players.

Theorem 5.

For the 2-person nonzero sum game defined in Section III-B, if the attacker’s payoff function in (2) is strictly concave in , then the attacker’s best response to any defense strategy can be described as

  • , if

  • , if

  • if , for any .

Also, if the defender’s payoff function in (4) is strictly concave in , then the best response can be described as

  • , if

  • , if

  • if , for any .

Proof.

Given the concavity of the payoff function in , the derivative is monotone. Hence, there exist three possibilities for the behavior of : if , then is strictly increasing in for all , thus the payoff is maximized by . If , then is strictly decreasing in for all , thus the payoff is maximum at . Otherwise, attains its maximum when , hence the best response belongs to the set at which . The second part of Theorem 5, which characterizes the defender’s best response can be proven similarly. ∎

Next, we study the effect of the attack cost and the moving cost and state bounds on the costs beyond which no player is interested in the game. When the cost exceeds a certain threshold, the cost of the attack dominates the attacker’s tradeoff, i.e., the attacker is better off backing off over attempting to leak information. Similarly, if is too high, the defender incurs a cost for migration that exceeds any benefit he would get at any migration rate.

In the following lemma, we derive a lower bound on the attack cost beyond which the attacker is always better off attacking with the minimum rate . If , then the attacker will back off.

Lemma 6.

For the two person nonzero-sum game defined in Section III-B, if is strictly concave in , and , where , then the attacker’s best response to any defense strategy is to attack at the minimum permissible rate .

Proof.

We argue that under the condition stated in the lemma, the attacker’s payoff is monotonically decreasing in . Hence, is the attacker’s best response to any . To show that is the unique best response, assume for contradiction that there exists such that . If , then is monotonically decreasing, therefore since . Hence, is not in the best response set. The details of the proof are deferred to the Appendix. ∎

Similarly, the following lemma establishes a lower bound on the migration cost of the defender, beyond which it is more advantageous not to migrate before the system reconfiguration cycle .

Lemma 7.

For the two person nonzero-sum game defined in Section III-B, if is strictly convex and is continuous in , and , then the action of not migrating any VM before is the defender’s unique best response regardless of the attacker’s strategy , where is the expectation with respect to and .

Proof.

By an argument similar to the proof of Lemma 6, under the condition in the statement of the lemma, the defender’s payoff is monotonically increasing in . Hence, for any . Establishing the uniqueness of as a best response action follows the same argument used in the proof of Lemma 6. The details are deferred to the Appendix. ∎

Iv-B Specific instantiation analysis

In Section IV-A, we provided conditions for the existence of an equilibrium for generic reward functions. The conditions imposed were the strict concavity of in addition to the non-negativity, monotonicity and stationarity of (stationarity in that the accumulated reward depends on the collocation and migration times only through their difference, i.e., the duration of collocation). In this section, we study existence conditions for equilibrium and characterize the best response sets of both players for specific choices of the reward function and the collocation pdf . Specifically, we provide an analysis for the case where increases linearly in the collocation duration . Hence, we analyze the formulated timing game for the following choice of ,

(7)

In Section VI-D, we provide numerical results on the best response for other (non-linear) functions, including when scales quadratically in . Without loss of generality, we always consider . The case corresponds to the case with the migration cost replaced by .

Since the attacker controls the rate of attacks , in our numerical evaluation we consider an exponential pdf for the collocation time, i.e.,

(8)

This choice of is motivated by the interpretation of as the rate of attacks launched by the adversary where .

Next, we derive sufficient conditions for the existence of a Nash equilibrium for the choice of functions in (7) and (8).

Theorem 8.

Consider the 2-person nonzero-sum game defined in Section III-B with and defined in (7) and (8). If , then the game admits a pure strategy Nash equilibrium.

The proof of Theorem 8 provided in the appendix rests upon establishing sufficient conditions for the strict concavity for and , which translate into existence of a Nash equilibrium in pure strategy from [46, Theorem 1]. In Fig. 2, the region of intersection of the vertically and horizontally hashed areas represents the region of concavity of and in and , respectively, for all actions of the other player for . The figure also shows two different games with their corresponding action spaces satisfying the existence condition of Nash equilibrium of Theorem 8. Later in Section VI, we verify that games played on these action spaces admit a Nash equilibrium in pure strategies. The next results follow directly from Theorem 8.

Fig. 2: Games with different action spaces satisfying the existence condition for NE.
Corollary 9.

The 2-person nonzero-sum game defined in Section III-B with and admits a Nash equilibrium in pure strategies for any , and .

Corollary 10.

The 2-person nonzero-sum game defined in Section III-B with and admits a Nash equilibrium in pure strategies for any , and .

The proof of Corollary 10 follows from the monotonicity of the RHS of the inequality in the statement of Theorem 8 in and the bound for .

To characterize Nash equilibria for both players, we start off by characterizing the best response set for each player in the following lemma whose proof follows the same argument used in the proof of Theorem 5.

Lemma 11.

For the 2-person game defined in Section III-B with the reward function and the probability density function defined in (7) and (8), the attacker’s best response pure strategy is characterized as

  • , if  

  • , if  

  • , otherwise,

for any action by the defender.

The best response strategy for the defender can be characterized as

  • ,   if

  • ,  if

  • otherwise,

for any action by the attacker.

The following two theorems whose proof is provided in Appendix B establish bounds on both the attack cost and the migration cost beyond which the players’ best response strategies are on the boundaries of their action intervals.

Theorem 12.

For the two person nonzero-sum game defined in Section III-B with the reward function in (7

) and the exponentially distributed collocation time

in (8), if

then the attacker’s best response to the action of the defender is .

Theorem 13.

For the two person nonzero-sum game defined in Section III-B with the reward function in (7) and the exponentially distributed collocation time in (8), if

then the defender’s best response to the action of the attacker is to stop migrations, i.e, .

V Extended Game Model

In the aforementioned model, the attacker’s goal is to be collocated with her victim as soon as possible before the victim is migrated. Evidently, upon collocation with her victim, the attacker will choose to reside there until since no detection mechanism is in place to urge her to evade. In this section, we extend the existing system model and consider the case in which the cloud data center is equipped with an intrusion detection system (IDS). The IDS monitors suspicious activities and captures malicious behavior of any user after a sufficient period of time . For useful detection, . Hence, the attacker may need to stop her collocation attacks before being detected. This introduces another control variable to be optimized by the attacker, namely how long she should continue to carry on the attack after successful collocation. The attacker does not know the operating threshold for the IDS – otherwise, she would choose to stop right before the IDS threshold. However, the attacker is assumed to have prior knowledge of the distribution of , in that he knows the pdf .

Next, we modify the attacker’s payoff function in order to account for the probability of detection. In the event of detection, the attacker incurs a cost (since this user will be black-listed), but her gain is in the data leaked until detection. Therefore, we redefine the attacker’s expected reward by averaging over both the detection threshold and the collocation time as,

(9)

The first term in (9) accounts for the attacker’s expected payoff in the event of no detection as the attacker stopped malicious activities before the IDS alarm, i.e, , as illustrated in Fig. 3. The second term represents the event of detection, hence collocation ends at , i.e., after a collocation duration as , as shown in Fig. 4. Therefore, the attacker incurs a detection loss . The third and fourth terms account for the event of no detection but due to the migration mechanism. In other words, the attacker is not identified because . The last term accounts for the cost of launching the attack.

Fig. 3: Attacker evades IDS by early stopping of malicious activity.
Fig. 4: Attacker detected by the IDS.

Similarly, we redefine the defender’s expected payoff function,

(10)

Vi Numerical Analysis

In this section, we provide numerical analysis of the proposed game model. To characterize the payoff functions for both players, we need to specify and . For the linear reward function and the exponential density function described in (7) and (8), the reward functions can be readily expressed as

(11)
(12)

for . In the following analysis, we study the behavior of the payoff functions for both players. We illustrate the reward of the defender as a function of the migration time for a range of attack rates . For the attacker, we plot her reward as a function of for different . Afterwards, we investigate the effect of the migration cost and the attack cost on the reward functions and the existence of Nash equilibria. We also examine the best response curves for both players. We also show the regions of strict concavity which suffice for the existence of a Nash equilibrium. Finally, we generalize our analysis to investigate different scaling regimes of the reward function.

Fig. 5: NE existence region. Game admits a Nash equilibrium in pure strategies.
Fig. 6: At , the attacker payoff is monotonically decreasing, but not for , in agreement with the bound on in Theorem 12.

We start our numerical analysis by reflecting on the theoretical analysis in Section IV-B. In Fig. 6, we plot the NE existence region for . A NE exists if the conditions in the statement of Theorem 8 are satisfied to ensure strict concavity of both and . Per Theorem 8, at and as marked with the hashed rectangle, the game admits a Nash equilibrium in pure strategies. For the illustrated action space , the game satisfies the sufficient existence condition of Theorem 8, i.e., the inequality holds . The figure illustrates the best response curves along with the game action space at and . The horizontally dashed region is the region of concavity of in for all . Similarly, the vertically dashed region designates the region in which is concave in for every . As per Theorem 8, any game defined within the region of intersection admits a NE in pure strategies. In this setting, the Nash equilibrium is unique – shown as the unique intersection point of the best response curves for both players at and . Theorems 12 and 13 established lower bounds on and beyond which and are monotone. In Fig. 6, we numerically verify the monotonicity of for and . Let and , hence according to Theorem 12, is monotonically decreasing when when . However, at , the attack cost ensures that is monotonically decreasing in . In Fig. 6 where , it is shown that the corresponding is monotonically decreasing for all for . At when the condition on is not satisfied, the payoff is not monotonically decreasing. Next, we study and discuss the effect of different system parameters on the players’ payoff and best response in comparison to other defense and attack policies.

Vi-a Payoff functions

Fig. 6(a) shows the payoff function of the defender versus the migration time for and . The figure highlights the tradeoff faced by the defender as he seeks to optimize to secure the system through VM migration while avoiding a large migration overhead. Evidently, the optimal migration time depends on the attacker’s strategy . The tradeoff shown in Fig. 6(a) agrees with our intuition based on the game model. Specifically, a very small signifying a high VM migration rate is associated with a high migration cost that dominates the payoff function . On the other hand, with a larger , the VMs dwell for a longer period of time on the same physical node giving the attacker more room to collocate and leak data from his targeted VM. In Fig. 6(a), we compare the defender’s reward at different attack rates . It is clear that when the attack is less aggressive, the defender is able to maximize his payoff by reducing the migration time at the expense of higher migration cost. Therefore, when increases from to , the optimal reduces from to . However, when the attacker is very aggressive (using a high attack rate), the defender is better off avoiding the migration cost by increasing to its maximum permissible value, i.e., . This means that no migration rate would limit the damage of the attacker, thus the best course of action in this case is to migrate at the end of cycle to avoid the migration cost.

In Fig. 6(b), we plot the attacker’s expected payoff versus the attack rate for different defense actions for an attack cost . As shown, the optimal attack rate depends on the defender’s action. As the attack rate increases, the cost of attack increases and becomes the dominant term in the payoff function. Moreover, as the defender reduces his time to migrate , the attacker’s reward decreases. This is due to the fact that when is small (a higher migration rate), there is a shorter time window for the attacker to successfully collocate with her victim. Contrariwise, when the migration rate is not too high (i.e., is fairly large), the attacker can maximize his reward by increasing the attack rate . This can be seen in Fig. 6(b) where is reduced from to , and the optimal attack rate that maximizes the payoff increases from to . However, if the defender is migrating the VMs at a very high rate, i.e, is very small, the attacker’s best response is to attack at the minimum possible rate or completely back-off since the attack is useless. To better understand the effect of the migration (attack) cost on the optimal migration (attack) rate for the defender (attacker), in the following two subsections we study the behavior of the payoff functions at different values of the cost. We also study the behavior of the best response curves to gain more insight into the tradeoffs associated with this game.

(a)
(b)
Fig. 7: (a) Defender’s reward versus migration time ; (b) Attacker’s reward versus attack rate .

Vi-B Cost effect and monotonicity

To show the effect of the migration and attack costs and , we plot the players reward functions for different values of the cost. In Fig. 7(a), we plot the defender’s payoff versus for different attack strategies for a a fairly small migration cost is . At this small migration cost, the defender’s best response is to always migrate at the highest permissible rate, i.e, regardless of the attack rate . Hence, the leakage loss term dominates the defender’s payoff function at this small migration cost. Indeed, referring to (12), is monotonically decreasing in when . On the other hand, when the migration cost is high as shown in Fig. 7(b) where , the defender’s best response is to reduce the associated migration cost. We remark that the reward function is monotonically increasing in for such high migration cost, a fact which was established analytically in Theorem 13.

(a)
(b)
Fig. 8: Defender’s reward versus migration time for (a) , and (b) .

Similarly, the effect of the attack cost can be shown in Fig. 8(a). At a very small attack cost, , as shown in Fig. 8(b), the attacker’s best attack strategy is to attack aggressively at the to maximize the chances of successful collocation regardless of the defender’s action. Recalling the attacker’s payoff function in (11), is monotonically increasing in when . In case of a high attack cost, the behavior of the payoff function is reversed as shown in Fig. 8(b) where . In this case, the cost of the attack term dominates the payoff function. Therefore, the best action for the attacker is regardless of the action of the defender. This behavior is confirmed by the analysis in Theorem 12.

(a)
(b)
Fig. 9: Attacker’s reward versus attack rate for (a) , and (b) .

Vi-C Best response curves

In this section, we study the best response curves for both players based on Definition 1 to provide more insight into the optimal action of a player as function of the action of the opponent. The solid blue line in Fig. 9(a) shows the defender’s best response curve as function of . The attacker’s best response curve as function of the defender’s action is shown in dashed red line. In this scenario, we set , , , and . In Fig. 9(a), the intersection point of the two response curves is the unique Nash equilibrium. The point(s) of equilibria depend on the values of and as detailed next. The best response curves also underscore the tradeoff for each player. For example, at equilibrium the defender migrates with while the attacker uses rate for the attack. Clearly, at low attack rate, VM migration at a very small migration rate, i.e, larger , is more favorable. As the attack rate increases, the defender is urged to migrate the VMs at faster rate, wherefore decreases as increases, but only until a certain point where faster migration becomes futile. Indeed, when the attack rate is overwhelming, it is more rewarding for the defender to use a large to alleviate high migration costs. On the attacker’s side, a similar tradeoff is observed. The attacker attacks the system at the minimum rate as long as the VM stays on the same physical node for a duration 0.4 since it is very hard to collocate when migration is taking place at such high rates. If the defender increases the time before migrating, i.e , the attacker is enticed to attack the system at higher rates to increase leakage. However, the maximum attack rate the attacker will select is which is strictly smaller than since the resultant attack cost yields a smaller overall payoff. The best response curves also demonstrate the monotonicity of the payoff functions with respect to and as explained earlier in Section VI-B. To show this, Fig. 9(b), 9(c) and 9(d) illustrate the best response curves at extreme cost values. In particular, in Fig. 9(b), both and are set to zero. It is obvious that the defender is migrating with the highest permissible frequency such that, for any attack rate. In response, the attacker’s best action is regardless of the defender’s action. Hence, when the costs of migration and attack are zero, both players do not face any tradeoffs and the game is zero-sum. Fig. 9(c) shows another extreme scenario where only the defender faces a very high cost for migration. His best response is , which corresponds to the lowest migration rate possible. In Fig. 9(d), the attack cost while the defender incurs zero cost for migration. Hence, the defender adopts the highest migration rate at against any attack rate. In response, it is more rewarding for the attacker to attack at unless the defender does not migrate before , in which case the attacker would increase the attack rate, i.e., .

(a)
(b)
(c)
(d)
Fig. 10: Players best response curves for different cost values.
(a)
(b)
Fig. 11: Defender’s (a) and Attacker’s (b) best response curves for different reward scaling regimes.

Vi-D Different reward scaling regimes

In the numerical analysis above, we considered the reward function to be linearly increasing in the collocation duration. In this section, we study other scaling regimes. In particular, we consider the scenario where the reward scales quadratically or cubically with the collocation duration. In Fig. 10(a), we plot the defender’s best response curves for linear, quadratic, and cubic reward functions. Intuitively, higher order reward functions are more disposed to dominate the payoff functions than for the linear scaling. In Fig. 10(a), the migration cost is set to . For the linear regimes, the defender is facing exactly the same tradeoff discussed earlier in Section VI-C. However, for higher order reward regimes, the reward term dominates the payoff over the entire range of attack rates in this case. Therefore, the defender is consistently urged to increase the migration rate as the attacker increases her attack rates. With the quadratic and cubic reward functions the defender’s best response is shown to exhibit a similar behavior, but conceivably the cubic reward yields a higher increase in the rate of migration.

In Fig. 10(b), the attacker’s best response curves are plotted for different reward functions. The higher the order of the reward regime, the more is the attacker enticed to attack. In the linear regime, the attacker’s best response rate is non-vanishing and increasing in for , but saturates at as soon as the cost of the attack starts to dominate the attacker’s payoff. For both the quadratic and cubic regimes, the higher reward from data leakage entices the attacker to attack at higher rates as increases.

As shown in Fig. 10(b), the cubic regime is extremely rewarding to the attacker, and as a result the attacker affords to attack at the maximum permissible rate as the reward term dominates her payoff function.

Vi-E Simulation of the game

In this section, we compare the payoff of both players playing NE strategies to the payoffs of other defense and attack strategies. As per our theoretical analysis in Section IV, the players’ optimal (Nash equilibrium) policies depend on the values of the associated costs and . Table I presents the results of a simulation of the game for the linear reward regime in which