CAMIG: Concurrency-Aware Live Migration Management of Multiple Virtual Machines in SDN-enabled Clouds

By integrating Software-Defined Networking and cloud computing, virtualized networking and computing resources can be dynamically reallocated through live migration of Virtual Machines (VMs). Dynamic resource management such as load balancing and energy-saving policies can request multiple migrations when the algorithms are triggered periodically. There exist notable research efforts in dynamic resource management that alleviate single migration overheads, such as single migration time and co-location interference while selecting the potential VMs and migration destinations. However, by neglecting the resource dependency among potential migration requests, the existing solutions of dynamic resource management can result in the Quality of Service (QoS) degradation and Service Level Agreement (SLA) violations during the migration schedule. Therefore, it is essential to integrate both single and multiple migration overheads into VM reallocation planning. In this paper, we propose a concurrency-aware multiple migration selector that operates based on the maximal cliques and independent sets of the resource dependency graph of multiple migration requests. Our proposed method can be integrated with existing dynamic resource management policies. The experimental results demonstrate that our solution efficiently minimizes migration interference and shortens the convergence time of reallocation by maximizing the multiple migration performance while achieving the objective of dynamic resource management.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 14

01/24/2021

SLA-Aware Multiple Migration Planning and Scheduling in SDN-NFV-enabled Clouds

In Software-Defined Networking (SDN)-enabled cloud data centers, live mi...
12/05/2021

A Taxonomy of Live Migration Management in Cloud Computing

Cloud Data Centers have become the backbone infrastructure to provide se...
11/12/2019

MSDF: A Deep Reinforcement Learning Framework for Service Function Chain Migration

Under dynamic traffic, service function chain (SFC) migration is conside...
11/17/2021

Efficient Large-Scale Multiple Migration Planning and Scheduling in SDN-enabled Edge Computing

The containerized services allocated in the mobile edge clouds bring up ...
03/11/2021

SDN Controller Load Balancing Based on Reinforcement Learning

Aiming at the local overload of multi-controller deployment in software-...
06/22/2018

Assumption Commitment Types for Resource Management in Virtually Timed Ambients

This paper introduces a type system for resource management in the conte...
05/31/2018

Fog-supported delay-constrained energy-saving live migration of VMs over MultiPath TCP/IP 5G connections

The incoming era of the Fifth-Generation Fog Computing-supported Radio A...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the rapid adoption of cloud computing for hosting applications and always-on services, it is critical to provide Quality of Service (QoS) guarantees through the Service Level Agreements (SLAs) between cloud providers and users. In this direction, many research works have investigated various aspects of dynamic resource management, such as delay-aware Virtual Network Function (VNF) placement [10], load balancing [36, 34, 21], energy-saving [3], flow consolidation, scheduled maintenance, as well as emergency migration, in terms of accessibility, quality, efficiency, and robustness of cloud services. Virtual Machine (VM) is one of the major virtualization technologies to host computing and networking resources in cloud data centers. As a dynamic resource management tool, the live VM migration is used to realize the objectives in resource management by relocating VMs between physical hosts without disrupting the accessibility of cloud services [7].

Cloud infrastructure and service providers, such as AWS, Azure, and Google, have been integrating live VM and container migration [8, 11, 33, 22, 26] for the purposes, such as higher priority task preemption, kernel and firmware software updates, hardware updates, and reallocation for performance and availability. For example, the Google cluster manager Borg controls all computing tasks and container clusters of up to tens of thousands of physical machines. In Google production fleets, a lower bound of 1,000,000 migrations can be performed monthly [26]. These show the critical importance of migration management techniques in dynamic resource reallocation.

Fig. 1: A general migration management framework

Figure 1 illustrates the general migration management workflow. Based on the various objectives, the resource management algorithms  [34, 36, 3, 21, 39, 4, 37, 25, 16] find the optimal placement by generating multiple live migrations. With the generated multiple migration requests, the migration planning and scheduling algorithm [13, 2, 35, 14] optimizes the performance of multiple migrations, such as total and individual migration time and downtime, while minimizing the migration cost and overheads, such as migration impact on application QoS. On the other hand, the computing and networking resources are reallocated and affected by multiple migrations.

As a resource-intensive operation, live migration consumes both computing and networking resources when transmitting the memory dirty pages from the source to the destination host. It puts stress on both the migrating services and other services in the cloud data centers. Thus, it is crucial to minimize migration interference during dynamic resource management. There are continuous efforts to take migration overheads into consideration during the dynamic resource management [34, 36, 3, 39]. Currently, most migration cost models consider overheads of single migration [1, 17, 15], such as migration time (single execution time), downtime, transferred data with respect to the size of memory, dirty page rate, data compression rate and available bandwidth while allowing multiple migrations in dynamic resource management. For the migration selection, existing resource management algorithms utilize the linear cost model of single migration to minimize the overheads. Then, with the migration requests generated as the input, multiple migration planning and scheduling algorithms [13, 2, 35, 14] decide the sequence of migration requests to achieve the maximal scheduling performance.

There are obvious gaps regarding the multiple migration performance between the existing dynamic resource management policies, the migration cost model and the multiple migration scheduling. The total migration time, the time interval between the start of the first migration and the end of the last migration, is the convergence time for the resource management solution. Overall, the real-time demands for live migration should be met by improving the performance in total migration time. For example, with the nature of highly variable workloads, SLA violations will occur as the resource demand surpasses the provisioned amount. In this case, a faster live migration convergence equals to less SLA violations.

Resource dependency between two migrations, such as sharing source and destination hosts or network paths, can largely affect the performance of multiple migration scheduling. With the network as a bottleneck, two resource-dependent migrations can only be scheduled sequentially, while independent ones scheduled concurrently [2, 35, 15]. If large amount of resource dependencies among migrations are generated by dynamic resource management, the performance of multiple migration scheduling will suffer a significant degradation. Since single migration overheads are only related to one migration, it is critical to consider multiple migration overheads in order to generate migration requests with less resource dependencies.

Therefore, we incorporate the resource dependency of multiple migrations into the cost model to bridge the gaps. Based on the maximal cliques and independent sets of the dependency graph of potential migrations, we propose a concurrency-aware migration (CAMIG) selection strategy for migrating VMs and destination hosts of the dynamic resource management. The contributions of this paper are summarized as follows:

  • We propose and model the multiple migration selection problem to minimize interference due to resource dependency among multiple migrations while achieving the objective of dynamic resource management.

  • We introduce the resource dependency graph to model migration concurrency.

  • We propose a flexible concurrency-aware migration selection strategy for dynamic resource management.

  • We conduct extensive experiments in an event-driven simulation to show the performance improvement in terms of total migration time in correspondence with resource management objective.

The rest of the paper is organized as follows. Related works of migration cost management and multiple migration scheduling are reviewed in Section 2. The system framework and migration overheads are discussed in Section 3. The problem model is described in Section 4. Section 5 proposes the concurrency-aware migration selection algorithm. Section 6 compares our proposed algorithm with other dynamic resource management algorithms in load-balancing and energy-saving scenarios. Finally, Section 7 summarizes the paper.

2 Related Work

algorithm resource management single migration overhead dependency aware migration performance migration scheduling
FFD [34] load/energy memory size - sum of migration cost -
HARMONY [27] load CPU, network - single exe. time one-by-one
Sandpiper [36] load memory size - single exe. time, migration number -
Xiao et al. [38] load/energy migration number - migration number -
lrmmt [3] load/energy memory size - migration number -
iAware [39] flexible single exe. time, computing - sum of normalized cost one-by-one
Our work (CAMIG) flexible migration model, computing computing, network sharing total mig. time, downtime multiple scheduling
TABLE I: Comparison of approaches on dynamic resource management through live migration

Many dynamic resource management solutions utilize live migration as a tool to achieve objectives, such as load-balancing [34, 27, 36, 21], energy efficiency [4, 37], network delay [25], and communication cost [16]. Among these solutions, some resource management algorithms consider a linear model of the total migration overheads as the sum of individual migration overhead [34, 27, 36, 21, 3, 39, 37]. However, existing research only considers the objectives of resource management while neglecting the multiple migration overheads and migration scheduling performance. Generally, during the dynamic resource management, there are three steps to generate migration requests: source host selection; VM selection; and destination host selection. The overhead or interference model of single migration [1, 17, 15] is considered during the VM and destination selections.

For the VM and destination host selection, many dynamic resource management policies consider single migration overheads in terms of the memory size of migrating VM, single migration time, and the impact of one migration on other VMs located in the source or destination host, such as CPU, bandwidth of host network interface, and application bandwidth. In the load balancing scenario, Verma et al. [34]estimated the migration cost based on the deduction of application throughput. It selects the smallest memory size VMs from the over-utilized hosts and assigns them to the under-utilized hosts in the First Fit Decreasing (FFD) order. Singh et al. [27]

proposed a multi-layer virtualization system HARMONY. It migrates VMs and data from hotspots on servers, network devices, and storage nodes. The load balancing algorithm is a variant of Toyoda multi-dimensional knapsack problem based on the evenness indicator Extended Vector Product (EVP). It considers the single live migration impact on application performance based on CPU congestion and network overheads. Wood et al.

[36] proposed the load balancing algorithm Sandpiper that selects the smallest memory size VM from one of the most overloaded hosts to minimize the migration overheads. Mann et al. [21] focused on the VM and destination selection for the load balance of application network flows by considering the single migration cost model based on the dirty page rate, memory size, and available bandwidth.

In the energy-saving scenario, Xiao et al. [38] investigated dynamic resource allocation through live migration. The proposed algorithm avoids the over-subscription while satisfying the resource needs of all VMs based on exponentially weighted moving average to predict the future loads. It also minimizes the physical machines regarding the energy consumption. Similarly, LR-MMT [3] focused on energy saving with local regression based on history utilization to avoid over-subscription. It chooses the least memory size VM from the over-utilized host and the most-energy saving destination. Wu et al.  [37] also studied the same problem of maximizing the power saving through VM consolidation by limiting individual migration costs. With the input of candidate VMs and destinations provided by other resource management algorithms, iAware [39] is a migration selector minimizing the single migration cost in terms of single migration execution time and host co-location interference. It considers dirty page rate, memory size, and available bandwidth for the single migration time. They argue that co-location interference from a single live migration on other VMs in the destination host in terms of performance degradation is linear to the number of VMs hosted by a physical machine in Xen. However, it only considers one-by-one migration scheduling.

Taking the migration task list generated by resource algorithms as input, migration scheduling algorithms focus on minimizing the migration time by efficiently scheduling them. To find a possible sequence of migrations, one-by-one scheduling [13] focused on avoiding the deadlock on the available resource of physical hosts. The multiple migration planning and scheduling algorithms [2, 35, 14] focused on the migration performance in terms of minimizing the total migration time by scheduling given migration tasks concurrently when necessary. Table I summarizes representative related works and the proposed generic solution for existing dynamic resource management algorithms in management target, migration overhead, interference, migration performance, and migration scheduling method.

3 Live Migration in Resource Management

We first introduce the background of live migration management including system overview and single cost model. Then, we discuss the resource dependency problem.

3.1 System Overview

Fig. 2: System Overview

By integrating Software-Defined Networks (SDN) [23], the SDN-enabled cloud data centers have a centralized solution for the monitoring, planning, and scheduling of virtualized computing and networking resources [28]. Fig. 2 illustrates the migration framework in the orchestration layer. The dynamic resource manager integrated with migration selector and multiple migration scheduler based on both monitoring computing resource and network resources. VMs are hosted on physical machines to provide various cloud services. Computing resources are controlled by VM Manager (VMM), such as OpenStack Nova, while the networking resources (such as available bandwidth and routing) are managed by the SDN controller and VM Networking Service, such as OpenStack Neutron, in a centralized way. The SDN controller can dynamically manage the routing for migration elephant flows to avoid the congestion and alleviate the impact on cloud services. We can predict the cost of live migration by the available bandwidth between the source and destination hosts.

3.2 Single Migration Cost Model

To better understand the impact of multiple migrations on performance in dynamic resource management settings, we first introduce the mathematical model of a single live migration [15]. Live migration can be categorized into two types: post-copy and pre-copy migration. Since the pre-copy migration [7] is the most widely used approach in hypervisors (KVM, VMWare, Xen, etc), we consider it as the base model. During the pre-copy live migration for VMs or Containers, the hypervisor or the Checkpoint/Restore agent in the userspace (CRIU) [9] iteratively copies the dirty memory pages in the previous transmission interval from the source host to the destination host.

The most important aspect of single migration overheads is the migration time or the single migration execution time. According to the live migration process [7], the pre-copy live migration consists of eight phases (see Fig. 3): pre-migration, initialization, reservation, iterative memory copy, stop-and-copy, commitment, and post-migration. Thus, live migration consumes both computing resources (pre-/post-migration overheads) and networking resources (memory copy and dirty page transmission) [15]. The total single migration time can be categorized into three parts: pre-migration computing overheads, memory-copy networking overheads, and post-migration computing overheads:

(1)
Fig. 3: Pre-copy Live Migration

Based on the iterative pre-copy illustrated in Fig. 3, the migration performance in terms of memory-copy can be represented as [15]:

(2)
(3)

where the ratio , is the compression rate of dirty memory, is memory size, is available bandwidth, is dirty page rate, is the total migration round, denotes the maximum allowed number of iteration rounds, is the remaining dirty pages need to be transferred in the stop-and-copy phase, and is the configured downtime threshold.

3.3 Resource Dependency

Not only the overheads of the single migration but also resource dependencies among multiple migrations can heavily affect the performance of dynamic resource management.

For dynamic resource management policies, there are three selection steps: (1) selection of source physical hosts that need to be adjusted based on the management objective; (2) selection of VM(s) which need to be migrated from the selected host(s); and (3) selection of destination hosts of live VM migrations among potential candidates. With the input of candidate VMs and available destination hosts, different combinations of source and destination can achieve the same objective of dynamic resource management. However, there is a huge difference between these combinations in the scheduling performance of multiple migrations due to resource dependencies among migrations. If sharing the same source or destination hosts, or part of the network routing, two live migrations are resource-dependent.

Two resource-dependent migrations can not be scheduled at the same time [2, 15]. Because, according to equation (2), larger bandwidth allocation means a smaller migration execution time and downtime. Thus, the networking resources are the bottlenecks which need to be optimized during the multiple migrations. For example, we have a number of migrations partially or entirely sharing network paths. Based on equation (2), if scheduled at the same time, experimental results [15] show that the total migration time will be more than the sum of single execution time. Thus, sequential scheduling of dependent migrations is the most efficient way to optimize the migration performance[2, 15]. Meanwhile, migrations which are resource-independent can be scheduled concurrently to reduce the total migration time. Therefore, it is essential to exclusively allocate one network path to only one migration until it is finished to achieve the optimal total migration time, average execution time, and downtime.

3.4 Illustrative Example

(a)
(b)
Fig. 4: Scenario of Resource Dependencies during Migration Selections: (a) Initial Placement and (b) Virtual Connections between VMs with Memory Size and Dirty Page Rate

Fig. 3(a) shows the initial VM placement of the illustrative example along with the resource dependency among possible migration selections. Fig. 3(b) illustrates the virtual connections between VMs and the memory size (GB) and dirty page rate (Mbps) for each. Moreover, the threshold of iteration rounds is 30 and downtime threshold is 0.5 seconds. The objective of the management policy is to reduce the communication cost by VM consolidation. There are several potential migration combinations which can fulfill the objective: M1: and ; M2: and ; M3: and ; and M4: and . We can schedule two resource-independent migrations concurrently (M2 and M3). On the other hand, one migration can only be scheduled in sequence after the completion of another dependent migration (M1 and M4).

We use Mininet [18] to emulate the iterative network transmission of the live migration. The execution time for each potential migration of , , , and based on the available bandwidth is 6.2791, 15.0889, 29.1980, and 12.5143 seconds, respectively. The total migration time of combination M1-M4 is 34.8858, 12.4334, 28.4711, and 27.6032 seconds. Moreover, when the service network and migration (control) network are running separately [32], the available bandwidth for each live migration is the same (10 Gbps). Based on multiple migration planning and scheduling algorithms [2, 35, 14], the total migration time of four different combinations M1-M4 is 28.1936, 12.1227, 22.6056, and 26.8893 seconds, respectively. Comparing M2 with M1 and M4, since there is no resource-dependent migration in M2, the total migration time is significantly shorter. Comparing M2 with M3, although there is no network resource sharing in both combinations, the single live migration overheads of M2 is smaller due to the memory size, dirty page rate, and available bandwidth. Summarily, although all the potential combinations can achieve the desired objective, the scheduling performance of multiple migrations varies considerably. Thus, it is essential to minimize both resource dependencies among migration requests and single live migration overheads during dynamic resource management.

4 Problem Modeling

In this section, we model the problem of multiple migration selection to minimize the migration dependency while achieving the objective of dynamic resource management as a Mixed Integer Programming (MIP) problem.

In the model, is the set of all candidate destination physical hosts while denotes the set of candidate VMs for the migration. is the set of candidate hosts for VM

. Let binary variable

indicate both initial and final placement of VM in host h. When the VM is in the initial host , . When VM is in the host in the final placement, . Otherwise, . Let the binary variable indicate whether VM is in the host in the final placement. In other words, if VM is migrated to host , then and . If VM is not migrated, then and . Otherwise, which indicates that VM is not in host in the final placement determined by the dynamic resource management policy.

To generalize the problem, we can omit the VM index for by adding extra constraints to when some destination hosts are not available for the specific VM :

(4)

where indicates the unavailable host for VM .

The migration execution time of can be calculated according to equations (1)-(3). Furthermore, we normalize the migration execution time based on the largest and smallest execution time among the different source and destination pairs for every VMs.

As there can be only one destination and the VM must be allocated in one and only one host at the same time, we add the following constraints to the binary variable :

(5)

The VM can only be migrated from source host of the initial placement where to the destination host of the final placement that , and or not be migrated at all . Thus, we have the constraints expression as follows:

(6)

Constraints of the placement binary variable are:

(7)

where , when VM is migrated to other host in the final placement. , when VM is still in host in the final placement.

Let denote the binary variable indicating whether VM and are migrated to destination and :

(8)

where , if , and , . Otherwise, .

There is a resource dependency graph for all possible migrations. Let denote a migration with source host and destination host . If node and are connected in graph , then edge . This indicates that potential migrations of VM from host to and VM from host to are resource-dependent which can only be scheduled in a sequential manner. Thus, the resource dependency between two potential migrations can be represented as:

(9)

Let and denote the initial score and target score of dynamic resource management and represent the tolerant value for accepted range. Let denote the objective score achieved after all migrations based on indicator. Thus, the constraints of final placement for dynamic resource management can be represented as:

(10)

In practice, we can replace (10) for a specific placement score function. For example, in load balancing policies, let and denote the load of VM and . We can represent the constraints of dynamic resource target for the final placement as:

(11)

where and is the tolerant value among the physical hosts.

In addition, let = denote the normalized computing resource capacity of physical host for memory , CPU , storage disk , and total workload . Therefore, the constraints of computing resources, such as workload, can be represented by:

(12)

The single and multiple migration overheads, and , are calculated as:

(13)
(14)

where and omit the subscripts for a concise equation.

Therefore, the objective of the problem in terms of minimizing both single migration overheads and resource dependencies among multiple migration requests can be formulated as:

(15)

subject to constraints (4) - (12).

The objective function contains two parts: the first objective is for the sum of single migration overhead, where indicates single migration time of VM from source host to destination host . Note that, although only migration time is modeled, it can be extended to other interference, such as CPU congestions, heterogeneous links, bandwidth overheads on other applications, and the number of co-located VMs in the destination host. The second part is multiple migration overheads during multiple migration scheduling. Namely, it indicates how much overheads due to resource dependencies happened. The fewer dependencies in migration requests with less individual overheads, the greater possibility of larger concurrent migration groups during scheduling, which results in a shorter total migration time.

5 Concurrency-Aware Selection

Solving the MIP model in Equation (15) is NP-hard, it is not practical to use MIP solver to get the solution. In this section, we introduce the Concurrency-Aware Migration (CAMIG) selection algorithm for minimizing the resource dependencies and overheads among migrations during dynamic resource management. Based on the three selection steps of resource management policy, CAMIG has the flexibility to integrate with existing algorithms. Provided that VMs are selected by the policy, CAMIG selects migration destinations to minimize resource dependency. Moreover, if only the management objective and source host selection criteria are given, CAMIG selects both VMs and migration destinations.

The rationale behind CAMIG is to select the migration with the least resource dependency and single migration overhead in each round with the currently selected migrations and minimize the dependency for the future one based on maximal cliques and independent sets of the resource dependency graph. Graph theory concepts, such as maximal cliques and independent sets, are explained in Section 5.2. There are mainly three steps: (1) build the migration dependency graph; (2) get all maximal cliques and independent sets of a migration from the dependency graph; and (3) calculate the single migration interference and migration concurrency metric (MIGC) of candidate migrations.

5.1 Migration Dependency Graph Build

Fig. 5: A Resource Dependency Graph with Two of its Maximal Cliques Marked by Color

We first explain how to generate the resource dependency graph based on the potential migrating VMs and destinations. For the undirected graph , let () be the source-destination pair (src-dst) node or vertex representing one potential migration. Migrations with same src-dst node are categorized in list . Let be the dependency between two migrations with src-dst node and . As shown in Algorithm 1, with the input of potential migrating VMs and corresponding destination candidates

, we first add src-dst nodes and classify potential migrations into the corresponding node in

. Then, we add edges into based on the source and destination of each node. Fig. 5 demonstrates an illustrative example of resource dependency graph based on a given list of potential migrations ( to ) in a specific dynamic resource management which involves 9 src-dst pairs in the same physical network topology shown in Fig. 3(a) (four hosts connected through one switch). Each vertex indicates the pair of source and destination host for a group of potential migrations. For the sake of conciseness, we use to to represent node to .

Input: potential VM , Destinations
Result: migration depGraph ,
1 foreach  do
2       ;
3       foreach  do
4             AddNode (, );
5             ;
6            
7      
8foreach  do
9       foreach  do
10             if  then
11                   if IsDependent (,) then
12                         AddEdge (, ());
13                        
14                  
15            
16      
return ,
Algorithm 1 Create and queues

Regardless of the number of potential migrations, the scale of only depends on the source and destination hosts involved. Given a list of migrations = , the dependency graph of can be constructed as . As migrations with the same source and destination are always resource-dependent, we categorize migrations into different lists of src-dst pair . Then, all migrations can be represented as . The size of node in the migration dependency graph will be the total combination of source and destination hosts. Through this pre-processing, the total nodes of can be reduced from as many as the potential migrations to the migration pair participated . Therefore, the upper-bound of total nodes in graph is . and are the number of potential source and destination hosts, respectively.

Note that the dependency graph supports the multiple routing transmission and dynamic migration routing based on the current network status. In certain data center networks, multi-path transmission and multiple network interfaces of physical hosts are supported. Thus, the vertex in can be extended to indicate the network paths for migrations from the specified network interfaces set of source host to interfaces set of destination host. Let indicate the available bandwidth of network paths . Given two pairs of src-dst interfaces set and and corresponding network paths and , two vertices and are resource-independent, when statement (16) are true and and :

(16)

where and indicate the network capacity of interface set and and indicate the available bandwidth of network paths. Otherwise, the two vertices are resource-dependent. The upper bound of total nodes in is the total number of .

5.2 Maximal Cliques and Independent Sets

Fig. 6: All Maximal Cliques and MISs of in Fig. 5

Before discussing how to get maximal cliques and maximal independent sets (MISs) which include a certain node , we first review some basic concepts, such as clique, independent set, and degeneracy. A clique is a subset of vertices of an undirected graph such that every two distinct vertices in the subset are adjacent [5]. The maximal clique is a clique that cannot be extended by including one more adjacent vertex. An independent set of a graph is the opposite of a clique that no two nodes in the set are adjacent. Fig. 6 shows all maximal cliques and MISs of the (Fig. 5). For example, is one of its maximal cliques and is one of its MISs. The problems of finding all maximal independent sets and cliques are complementary and NP-hard [5, 19]. Finding all maximal independent sets of a graph is equal to finding all maximal cliques of its complement graph [31]. As a robust metric to indicate graph density or spareness, degeneracy of a graph is the smallest value such that every nonempty subgraph of contains a vertex of degree at most [20].

A clique of is a set of src-dst nodes, where migrations with these nodes can not be scheduled at the same time. In contrast, the migrations from the src-dst nodes within an independent set can be scheduled concurrently. To check and evaluate the resource dependency or concurrency of each migration with src-dst pair node , we need to generate all maximal cliques and MISs of including node . Let and be all maximal cliques and all maximal independent sets of , where and is one of the maximal cliques and MISs. Let and denote one of the maximal cliques and independent sets including node . Then, and .

Input: , node neighbors in
Result: All MISs of node ,
1 Function Cliques (, ):
2       GetNodeMaxDegree ();
3       foreach  do
4             ;
5             ;
6             if  then
7                  ;
8            else
9                   if  then
10                         Cliques (, );
11                  
12            ;
13            
14      
15End Function
16 ; ;
17 ;
18 ;
return Cliques (, );
Algorithm 2 Get All MISs of Node in

We propose an algorithm for listing and based on of dependency graph. For getting all maximal cliques of a graph, the general-purpose algorithms for listing all maximal cliques [6, 31] based on Bron-Kerbosch algorithm [5] take exponential time due to the maximum possible number of cliques. These general-purpose algorithms are not sensitive to the density of a graph. Therefore, parametrized by degeneracy, we use a variant algorithm Bron-Kerbosch Degeneracy [12] to generate all maximal cliques of the original resource-dependency graph without duplication. All maximal cliques are generated in the tree-like structure by employing the pruning methods with pivoting to allow quick backtrack during the search. Based on the Bron-Kerbosch algorithm with pivoting, the Bron-Kerbosch Degeneracy uses a degeneracy ordering to order the sequence of recursive calls without pivoting at the outer level of the original Bron-Kerbosch algorithm [12]. Applied to a -vertex graph with degeneracy, it lists all maximal cliques in time .

As shown in the dependency graph property analysis (Appendix A) and the time analysis in the performance evaluation (Section 6.2), it is not practical to generate all maximal independent sets due to the density of the complement of . Thus, we propose a clique-based maximal independent set algorithm to calculate . As shown in Algorithm 2, it fist excludes all adjacent nodes of in the resource dependency graph . Then, it chooses node with maximum degree from each connected candidates of the remaining complement graph recursively in a branch-and-bound method until there is no vertex left. Algorithm 2 can achieve the worst-case optimal time complexity of finding all MISs of a node as  [6], where .

5.3 Concurrency for Migration Candidates

In this section, we introduce the migration concurrency metric (MIGC) to indicate the resource dependency level of a potential migration. It is based on the maximal cliques and independent sets of an src-dst pair node. Let be the list of migrations have been selected currently. Let be the list of src-dst pair nodes of each migration . For the first round , when the list of selected VM migration is empty, MIGC can be calculated as:

(17)

where and , is the coefficient for the value normalization. When , the MIGC of migration with src-dst pair node in can be represented as:

(18)

The migration independent score of the testing node regarding to the selected migration list can be calculated as:

(19)

where indicates how many times src-dst nodes of migration from the currently selected list is shown in all MISs of the testing node . is the product of the total number of and the number of selected migrations.

Similarly, the migration clique score for src-dst pair node according to the node list of currently selected migrations is represented as:

(20)

where the numerator part indicates how many times the src-dst pair nodes of currently selected migrations is included in the maximal cliques of the node .

The range of the migration clique score and independent set score is and . The largest is 1 when all src-dst pair nodes of selected migrations in shown in every maximal clique of the testing node. is 0 when there is no pair node included. If there is no src-dst pair from the existing migration list included in the MISs of node , we set the second part of as with current minimum value. Thus, the smaller of a potential migration, the fewer migration dependencies for the selected migration lists and future selections. Note that we do not need to check of two migrations with the same node, as the result will be the same.

5.4 Concurrency-Aware Migration Selector

In this section, we explain the details of the proposed concurrency-aware migration selector (CAMIG) in Algorithm 3. It minimizes resource dependency and migration overheads while achieving the objective of resource management. Given the input of the objective of the dynamic resource management, the objective function, available VMs, candidates source and destination hosts, the networking information monitored by the SDN controller, and the VM and host information, CAMIG will generate the live migration list which consists of the selected VMs and the corresponding destinations.

Input: Performance Objective , protential VMs , source , dst
Result: Selected Migration List
1 Step 1. get node clique matrix
2 , CreatedepGraph ();
3 allCliques ();
4 ;
5 do
6       Step 2. get candidate VMs
7       UpdateMigInterference (VM, , );
8       , , GetMigCandidates (, , , , );
9       Step 3. select the optimal migration
10       ;
11       if  then
12             foreach  do
13                   = allCliques (, );
14                   = allIndepSet (, , );
15                   if  then
16                         ;
17                         ;
18                        
19                  
20            
21      ;   ;
22       UpdatedepGraph (, , , )
23      
24while  and and ;
return
Algorithm 3 CAMIG

In Step 1, and are generated according to Algorithm 1. In line 3, we find all maximal cliques of . From line 5-18, at each round , we select the optimal migration from src-dst node based on both and single migration overhead . As a result, it gets the overall minimal dependencies and single overheads of the total migrations to satisfy the objective of the dynamic resource management. For Step 2, in each optimal round, it first updates the single migration interference of each candidate VM for its potential destinations. According to the selected migrations of previous rounds and current placement, it gets the newest VM to Host mapping. Then, it obtains the candidate migrations and corresponding pairs in this round with the same objective score . It can generate more potential migrations by enlarging the score tolerance of the optimal objective in each round. For Step 3, the optimal migration with the minimum total migration interference is selected. It first calculates based on all maximal cliques generated based on Bron-Kerbosch Degeneracy algorithm and according to Algorithm 2. Then, based on the pair list of already selected migrations , the migration overhead of migration with src-dst pair can be calculated as:

(21)

where is the coefficient for the value normalization of single migration overheads. Then, the single migration overhead and can be calculated based on Equation (1)-(3) and (17)-(19), respectively. In line 17, it adds the optimal migration of this round and its pair node to the currently selected migration list and corresponding node list .

In line 19, algorithm UpdatedepGraph updates the dependency graph and all maximal cliques according to the selected migration. Certain potential migrations related to the selected optimal migration are deleted from the the pair list. For example, in Section 3.4, if we choose migration , then is excluded for future selection. Note that we do not need to use Bron-Kerbosch Degeneracy to recalculate based on the new subgraph (Theorem 1). If the pair list is empty after update , the corresponding node will be removed from and . If the updated clique size is 1 and the only one vertex left has connected edge, remove such clique. Duplicated cliques are also removed.

The stop conditions of CAMIG are: (1) at the round , the currently selected VM migrations achieve the objective of dynamic resource management; (2) the objective is not improved in the last round; (3) round number equals to the total number of potential VMs.

Theorem 1 (Correctness of UpdatedepGraph).

Given a graph , , its all maximal cliques and its subgraph with removing vertices , results of UpdatedepGraph algorithm and listing all maximal cliques of are the same.

Proof.

Bron-Kerbosch Degeneracy generates all and only maximal cliques of [12]. (1) For , , and . Because the . Thus, . (2) For , , and . For the sake of prove, we assume that . Then, , , where , , part of remaining vertices , part of removing vertices . Then, we have . If , because , then . We have a contradiction, as is a maximal clique of . If or , as the UpdatedepGraph removes all , we have a contradiction . Thus, . Similarly, we can prove . Therefore, . ∎

The worst-case running time of Bron-Kerbosch Degeneracy is [12] with total vertices and degeneracy . The upper bound of all maximal cliques/independent sets of a Graph is . Thus, given maximal cliques, the time complexity of the algorithm for calculating MIGC is . Then, the worst-case running time of CAMIG is . We perform extensive computational evaluation on time complexity in Section 6.2. It demonstrates that algorithm CAMIG is very fast in practice.

6 Performance Evaluation

In this section, we evaluate the performance of our proposed concurrency-aware migration selection (CAMIG) algorithm for dynamic resource management with several parameters, such as total migration time, total migration number, and corresponding dynamic resource management performance in load balancing and energy-saving scenarios. We used both real-world workload trace from PlanetLab [24] and synthetic workloads for the evaluation. We also performed extensive computational experiments for time analysis. The results show that the proposed algorithm can significantly improve the multiple migration performance [14] while achieving the target of resource management.

The scalability of Mininet is limited due to the limitation of its resource usage and the operating systems, which prevents the cloud-scale simulations. Furthermore, it can not simulate the computing resource for the dynamic resource management and multiple migration scheduling. Thus, we have implemented components for the multiple migration scheduling simulations [14] based on the CloudSimSDN [30]. The accuracy of network processing of CloudSimSDN compared to Mininet is validated in [29]. Based on the phases of pre-copy migration, the event-driven simulator111CloudSimMig. https://github.com/hetianzhang/CloudSimMig can evaluate the performance of multiple migrations in terms of the total migration time, migration execution time, total transferred data, and downtime.

6.1 Load Balancing Scenario

In this section, we evaluate the impact of migration concurrency during the dynamic resource management on the performance of multiple migration scheduling in load balancing scenarios. The target of the resource management policy in this experiment is to keep the total CPU utilization of each physical host to 50%. For other solutions besides the optimal, we set the target range of the total CPU utilization from 45% to 55%. We compare our algorithm CAMIG with the result of the optimal and other load-balancing algorithms: Sandpiper [36], FFD [34], and iAware [39]. We first evaluate algorithms on small-scale experiments with 8 physical hosts in a Fat Tree. Then, we extend the experimental scale for complex scenarios with more resource dependencies. In extensive experiments, by integrating the proposed concurrency-aware algorithm with existing dynamic resource management algorithms, we directly evaluate and illustrate the scheduling performance improvement in multiple migration planning and scheduling algorithm [14].

Fig. 7: Initial mapping for 8 different physical hosts with CPU utilization(%)/Requested Memory(GB)

6.1.1 Experimental Setup

In order to focus on the performance of multiple migrations for different migration requests generated by various resource management algorithms, we controlled variables of single migration overheads, such as dirty page rate, that other comparison algorithms ignore. In the load-balancing scenario, we use the same source selection as Sandpiper to choose over-utilized source hosts for potential migration.

The actual location of physical hosts in Fat Tree topology with different resource utilization is generated randomly, which causes different source and destination selections and resource dependencies in each random setup. Without specific explanation, the result is the average value of 10 experiments. Causing utilization difference among hosts, the initial placement of VMs in each machine with different CPU utilization and memory size is shown in Fig. 7. To differentiate the migration value in management objective and migration schedule, we create VMs with different combinations of high, medium, and low value of resource utilization and memory size. The CPU utilization of each VM is from 4% to 20% of the total host CPU resource. As a result, the CPU utilization of each host is from 10% to 90%. The Memory size of each VM is from 2 GB to 16 GB, which can result in various migration overheads.

Other parameters of pre-copy migration are set as the same for each VM. The dirty page rate factor is 0.001 per second. For example, with a 0.001 per second dirty page rate factor, the dirty page rate of a VM with 16 GB memory is 128 Mbps. The data compression ratio is 0.8. The iteration and downtime threshold is 30 and 0.5 seconds, respectively.We create a k-8 FatTree Data Center Network (128 hosts) with 1 Gbps bandwidth between switches. For the purpose of irrelevant parameter exclusion in experiments, each physical host has 16 CPUs with 10000 MIPS each, 10GB RAM, 1 TB storage, and 1 Gbps network interface. Note that hosts are not required to be identical in the proposed algorithm.

Dual simplex (Gurobi optimizer 9.0222Gurobi solver, https://www.gurobi.com/ and Python-MIP 1.6.7333Python-MIP. https://github.com/coin-or/python-mip) were used to get the optimal solution of the MIP model. We also proposed a baseline algorithm called HostHits (hht). As shown in CAMIG selections, several potential destinations can achieve the same objective of dynamic resource management. It chooses the least selected/hit host as the destination of VM migration in each migration selection iteration.

For original Sandpiper, FFD and iAware without multiple migration scheduling, the sum of migration execution time is the actual total migration time of these algorithms, because they only consider one-by-one migration scheduling. However, given the multiple migration requests, we apply the multiple migration planning and scheduling algorithm [14] to all resource management algorithms in experiments and evaluate and show the results of corresponding performance in multiple migration scheduling.

approach multi1 multi2 multi3 multi4
optimal 71.5313 / 172.9520 71.5313 / 345.9040 71.5313 / 518.8560 71.5313 / 691.8080
camig 86.5060 / 189.5725 86.5060 / 379.1451 86.5060 / 568.7177 86.5060 / 758.2903
sandpiper 86.5060 / 189.5725 86.5060 / 379.1451 99.4928 / 594.7547 99.4860 / 784.4188
optimal+sandpiper 86.5329 / 189.6183 86.5329 / 379.2367 86.5094 / 568.8412 86.5329 / 758.4734
ffd 73.2070 / 133.0450 88.1817 / 266.1101 73.2203 / 399.2128 88.1949 / 532.3334
iaware 86.5158 / 174.6271 578.5142 / 969.6401 374.0354 / 1448.9137 419.1750 / 1941.2873
TABLE II: Total migration time/sum of migration execution time comparison in the extending mapping scenarios
approach multi1 multi2 multi3 multi4
optimal 5/ 3.1648/ 0 10/8.9682/ 0 15/ 10.2091/ 0 20/ 14.3697/ 0
camig 10/ 6.2048/ 7.4286 20/13.0928/ 6.9333 30/ 31.2534/ 6.7826 40/ 36.4625/ 6.7097
sandpiper 10/ 6.2048/ 7.1428 34/ 22.9404/ 6.6667 55/ 58.0650/ 6.6087 76/ 70.0414/ 6.5161
optimal+sandpiper 10/ 6.8879/ 14.2857 20/ 13.9321/ 10 30/ 21.4943/ 8.7826 40/ 32.6992/ 9.7419
ffd 11/ 6.3697/ 84.5714 21/ 19.2937/ 78.9333 33/ 23.1770/ 77.2173 54/ 45.3416/ 76.3870
iaware 15/ 9.0528/ 35.7142 53/ 49.4754/ 210.8 48/ 38.6271/ 235.9130 79/ 68.3587/ 248.25801
TABLE III:

Comparison of dependent migrations/multiple migration interference/standard deviation of CPU utilization

The rationale is that Sandpiper chooses the largest volume/memory VM from one of the most overloaded physical host to minimize live migration overheads. The volume as the multi-dimensional loads indicator is defined as: [36], where cpu, net, and memory are normalized utilizations of corresponding resources. FFD (First-Fit Decreasing) algorithm selects the smallest size VMs from over-utilized hosts and assigns them in the FFD ordering of the spare resources to under-utilized hosts. iAware considers both co-location VM interference and the single live migration overheads. The co-location VM interference is linear to the number of VMs one physical machine hosts in Xen. The migration selection in iAware is sequentially decided in each round of the greedy algorithm.

6.1.2 Scalability Evaluation

We extend the scale of experiments (multi2, multi3, and multi4) by multiplying the same mapping 2, 3, and 4 times. Total of hosts are randomly placed among the first number locations in the Fat Tree topology with 128 hosts. Each scenario has 16, 24, 32 candidate destination hosts with a total 76, 114, and 152 potential migration VMs, respectively. For example, the physical Host 16, Host 8 and Host 0 have the same VM initial allocation. However, for each scenario, the placement of each physical host in the FatTree is generated randomly. As the resource management algorithms do not have the prior knowledge of the initial placement, the combination of source, destination, and instances during migration selection is increased exponentially. As a result, with the experiment scale increasing, more random source and destination combinations of potential migrations are generated for each experiment. We conducted 10 experiments in each scenario and show the average results.

Table II and III show the results of the optimal solution, CAMIG and the optimal solution with Sandpiper VM selection, Sandpiper, FFD, and iAware in total migration time with multiple migration schedule, total migration execution time (one-by-one schedule), the number of dependent migration tasks, multiple migration interference value, and the load-balancing performance (standard deviation of CPU utilization). The multiple migration interference value is the sum of normalized single overheads from dependent migrations. Although all physical hosts are arranged randomly, the optimal result should be the same as in scenario multi1.

Analysis: Table II and III show that the MIP model achieves the optimal in all scenarios. With the source host selection from Sandpiper, comparing CAMIG with the optimal solution, as the problem scale increases, CAMIG can maintain the optimal performance in multiple migration scheduling as well as the number of resource-dependent migrations. In multi3 and multi4, CAMIG over-satisfies the requirement of load-balancing by losing the value of multiple migration interference. For the Sandpiper and iAware, as the the scale of the problem increases, the number of dependent migrations and the value of multiple migration interference increase dramatically, which leads to a larger total migration time in both multiple and one-by-one scheduling. FFD can not satisfy the requirement of load-balancing in the system.

The total migration time of Sandpiper is increased by 15.01% in multi3 and multi4. In Table III, although FFD has the lowest total migration time and migration execution time, it cannot achieve the ideal load-balancing performance. The standard deviation of FFD is the largest among other algorithms. Moreover, the largest total migration is increased by 21.33% compared to the lowest. For iAware, the actual total migration time equals to the total migration execution time by only allowing one-by-one scheduling. With multiple migration scheduling, iAware has the worst performance in total migration time and load-balancing due to the trade-off between migration execution time and co-location interference. The total migration time varies largely in different scenarios, increasing at most 568.68%.

(a) iAware
(b) FFD
(c) Sandpiper
Fig. 8: Performance Comparison with one-by-one, direct multiple scheduling, CAMIG and HostHits

6.1.3 Extensive Evaluation

As every load-balancing policy has its own logic for VM selection, it is difficult to evaluate the improvement of multiple migration directly. Thus, in this section, we extended the experiments by integrating the HostHits and CAMIG algorithm with the existing policies: iAware, FFD, and Sandpiper. With the benefit of flexibility, CAMIG can be adapted to other existing dynamic resource management algorithms. We randomly generated VM Memory Size from 8 to 14 GB with the same scenarios (Fig. 7). Each result is the average value of 10 experiments in each scenario. Fig. 8 illustrates the multiple migration performance in total migration time based on the migration requests of these policies with one-by-one scheduling and multiple migration scheduling (+sch), and multiple migration scheduling performance based on the migration requests of CAMIG (+camig) and HostHits (+hht) in 4 different scenarios.

Analysis: Fig. 7(a) indicates that iAware with CAMIG can achieve the best performance with multiple migration scheduler in all 4 scenarios. The performance is increased by 20.55%, 57.57%, 70.02%, and 77.93% when migration requests scheduled by the multiple migration scheduler, respectively. However, with CAMIG the performance is increased by 48.54%, 72.63%, 73.52%, and 86.48% compared to the original iAware and increased by 35.29%, 35.50%, 11.89%, and 38.68% compared to the performance of iAware with only multiple migration scheduler. Moreover, although iAware with HostHits generally has a better performance compared to iAware+scheduler, as shown in scenario multi3, it results in a worse total migration time due to creating a larger clique of the dependency graph. For FFD, CAMIG can increase the performance up to 91.90%, 57.82%, and 26.42% compared to FFD with one-by-one scheduler, multiple migration scheduler, and HostHits (Fig. 7(b)). Moreover, Fig. 7(c) shows that the performance of Sandpiper with CAMIG in total migration time is increased by up to 87.87% and 24.68% than Sandpiper with one-by-one scheduler and multiple migration scheduler, respectively.

6.1.4 Summary

In summary, CAMIG can efficiently improve the multiple migration performance while achieving the target of load-balancing resource management. The performance of comparing load-balancing policies can be increased by up to 91.90%, 57.82%, and 28.89% as compared to the one-by-one scheduler, the multiple migration scheduler, and HostHits, respectively. CAMIG outperforms the original policy and the HostHits. The round-robin algorithm HostHits cannot guarantee the multiple migration performance though it generally can decrease the total migration time.

6.2 Processing Time Analysis

Fig. 9: Runtime comparison between optimal and CAMIG
Fig. 10: Average and maximum degree and degeneracy
Fig. 11: Runtime of CAMIG, all maximal cliques, and all maximal cliques/independent sets of nodes

In this section, we analyze the time complexity of the proposed CAMIG algorithm. The experiments were run in the computer with i7-7500U CPU with 2.70 GHz, and 15.9 GB RAM in Windows 10 64-bit Operating System. Fig. 11 illustrates that the runtime of the optimal solution solved by MIP solver is increased exponentially against the linear growth of the problem size. The runtime of the optimal solution on average is 3.07s, 251.51s, 5373.35s, and 42388.0s in 4 scenarios, respectively. Thus, it is impractical to generate the optimal result when facing the problem in real life.

Fig. 11 illustrates the connectivity properties of dependency graph in terms of average degree , maximum degree , and degeneracy of the dependency and its complement . The number of maximal cliques is 12, 28, 42, 56 with the degeneracy (a measure of graph spareness) of the dependency graph as 6, 14, 22, 30. Therefore, it is much easier to generate all maximal cliques with a small degeneracy. However, the degeneracy of the complement dependency graph increased dramatically as 16, 85, 211, 393. Thus, it is impractical to generate all maximal cliques of the complement graph as the problem size becomes significantly large. In other words, Bron-Kerbosch Degeneracy algorithm can reach the worst-case runtime when the graph becomes considerably dense. As a result, it can only generate all 661 maximal independent sets in the smallest scale scenario (multi1). Fig. 11 shows the runtime comparison of CAMIG in total processing time, finding all maximal cliques, and generating all maximal cliques and independent sets for every node. As shown in Algorithm 2, we do not need to calculate all maximal cliques and independent sets of every node in the graph. The all_nodes_cliques/indep illustrates the upper-bound of runtime. The processing time of CAMIG is increased linearly against the total src-dst node in resource dependency and the average degree or the degeneracy of the complement of the dependency graph as shown in Fig. 11.

algorithm mig. num total mig. time dt. (s) workload num serve time incl. and excl. timeout (s) energy cost (Wh)
total timeout total excl. avg. excl. avg. incl. total host switch
NoMig - - - 1506464 0 11214923.24 7.44 - 1733432.22 1733432.22 0
LR-MMT 3741 28038.66 355.079 1399857 106497 8700783.51 6.21 1105.63 470492.05 465412.23 5079.82
HostHits 3680 25872.79 359.032 1416806 89550 9028858.54 6.37 447.61 487254.15 481810.21 5443.94
CAMIG 2534 7453.37 178.071 1458906 47522 9945354.17 6.82 80.76 450966.81 447817.74 3149.07
TABLE IV: Performance Comparison between LR-MMT, HostHits, CAMIG in energy-saving scenario
Fig. 12: Migration number within each interval
Fig. 13: Total migration time within each interval

6.3 Long-term Energy Saving Scenario

To evaluate the proposed algorithm with the real-world long-term workloads [24], we compared CAMIG with LR-MMT [3] in the energy-saving scenario in terms of total migration time, migration numbers, downtime, total/average CPU serve time with and without the timeout workloads, and energy (power) cost of both hosts and switches.

6.3.1 Evaluation Configuration

For the long-term experiments, we created a k-16 FatTree topology (1024 hosts) with 1 Gbps physical links between switches to simulate the environment with limited network resources for live migrations. Each physical host has 8 CPUs with 4000 MIPS, 1024 GB Memory size, 1000 GB Storage, and 1 Gbps network interface. The real-world workload trace of CPU utilization from Planetlab [24] was used for the experiments running in 24 hours. There are 1052 CPU utilization files mapping to the same amount of VMs. We generated the workloads based on the MIPS requirement and the CPU utilization varied along the time. In order to illustrate the influence of multiple migration performance, there is no application traffic between different VMs other than the migration flows. There are 4 flavors of VM: 2 vCPUs, [2500, 2000, 1000, 1000] MIPS, [2, 4, 4, 2] GB RAM, 100 Mbps virtual bandwidth, and 4 GB Disk Size. The initial placement of VMs are allocated based on the optimization criteria defined by LR-MMT [3].

The LR-MMT algorithm utilizes the Local Regression (LR) method to predict overloading hosts in the upcoming monitor interval. Minimum Migration Time (MMT) policy is used for VM selection to minimize migration overheads. During each monitoring interval of dynamic resource management, CAMIG, as a flexible algorithm, utilizes the same local regression to detect over/under-utilized hosts. In LR-MMT, though there are many equivalent optimal destinations, it only chooses the first fit. For the sake of fair comparison, destination candidates used in CAMIG are provided by the same energy-saving policy in LR-MMT.

6.3.2 Evaluation Results

As shown in Table IV, CAMIG algorithm outperforms both LR-MMT and HostHits. The total energy consumption under no dynamic resource management is 1733432.22 Wh. The LR-MMT algorithm saves 72.86% energy consumption. Comparing CAMIG with LR-MMT, the host and switch energy consumptions are 3.78% and 38.01% less, respectively. The total migration number is 32.26% less, the sum of total migration time of each monitoring interval is 73.42% less, the total downtime is 49.85% less than the LR-MMT algorithm. The performance improvements in total migration time also result in fewer workload timeouts and CPU resource shortages. For VM processing, the average CPU server time is 92.70% less when there is no timeout mechanism. With a timeout mechanism, CAMIG also reduces the workload timeout by 14.30% compared to the LR-MMT.

As the sum of total migration time and total migration time of each monitoring interval shown in Table IV and Fig. 13, within the 24 hours experiment, the performance of CAMIG in multiple migration scheduling is largely better than the LR-MMT. A shorter total migration time during each monitoring interval means a quicker state convergence for minimizing the over-utilization period and maximizing the energy-saving through VM consolidation for under-utilizing hosts. In other words, minimizing the dependencies among multiple migrations is not only critical for the migration scheduling, but also for the dynamic resource management that provides the migration list.

During the experiments, we find out that there are relatively large equivalent destination candidates in terms of energy saving. Therefore, by exploring the concurrency score among these candidates, we can minimize the resource dependencies among the migrations. As shown in Fig. 13, there are more migrations in CAMIG from 1200s to 3600s than LR-MMT. It is because in LR-MMT once the candidate is used it will be excluded from the remaining destinations. However, by choosing equivalent hosts during the destination selection, CAMIG algorithm enables more available destinations for VMs which need to be migrated from both under and over-utilized hosts. Thus, CAMIG algorithm actually produces fewer migrations in the remaining monitor intervals. It also illustrates that in some cases even the total migration number of CAMIG is larger, the total migration time is much smaller due to the minimum dependency among the migrations. Fig. 13 shows that, under certain circumstances (the peak migration time at 20000 second), even if there is a small number of migration tasks, the total migration time is still very large. Due to the nature of the consolidation algorithm, there are many migration tasks sharing the same destination or source hosts. Therefore, in traditional architectures, such as FatTree or even the dedicated migration network, it is inevitable that the convergence of multiple migrations is slower. As a result, the performance of multiple migration scheduling may be limited by this nature of resource competition among the consolidating VM migrations.

In summary, the evaluation demonstrates that, CAMIG can efficiently minimize the resource dependency among multiple migration tasks and achieve the objective of dynamic resource management in the long run. Thus, it also improves the performance of dynamic resource management algorithms in terms of QoS and energy consumption.

7 Conclusions

To the best of our knowledge, we are the first to consider the problem of minimizing the resource dependency of migration requests in dynamic resource management. We formally established a MIP model for the problem and proposed generic concurrency-aware migration selection algorithm (CAMIG). We conducted experiments to compare our proposed algorithms with existing dynamic resource management policies in load balancing and energy-saving scenarios by using both random synthetic setup and real trace data. Without changing the framework of existing policies, the results indicate that CAMIG can largely improve the performance of multiple migrations by up to 91.90% while achieving the target of dynamic resource management efficiently with near-linear computation growth in practice. In the long-term experiments, it can also reduce the total migration number, service downtime and management target in the host and switch energy consumptions.

Acknowledgments

This work is partially supported by an Australian Research Council (ARC) Discovery Project (ID: DP160102414) and a China Scholarship Council - University of Melbourne PhD Scholarship. We thank Editor-in-Chief, Associate Editor, anonymous reviewers, Redowan Mahmud, Linnan Ruan, Tawfiqul Islam, and Shashikant Ilager for their valuable comments and suggestions to help improve the paper.

References

  • [1] S. Akoush, R. Sohan, A. Rice, A. W. Moore, and A. Hopper (2010) Predicting the performance of virtual machine migration. Conference Proceedings In Proceedings of 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 37–46. External Links: ISBN 1424481813 Cited by: §1, §2.
  • [2] M. F. Bari, M. F. Zhani, Q. Zhang, R. Ahmed, and R. Boutaba (2014) CQNCR: optimal vm migration planning in cloud data centers. In Proceedings of the Networking Conference (IFIP), pp. 1–9. Cited by: §1, §1, §1, §2, §3.3, §3.4.
  • [3] A. Beloglazov and R. Buyya (2012)

    Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers

    .
    Concurrency and Computation: Practice and Experience 24 (13), pp. 1397–1420. Cited by: §1, §1, §1, TABLE I, §2, §2, §6.3.1, §6.3.
  • [4] J. Bi, H. Yuan, W. Tan, M. Zhou, Y. Fan, J. Zhang, and J. Li (2015) Application-aware dynamic fine-grained resource provisioning in a virtualized cloud data center. IEEE Transactions on Automation Science and Engineering 14 (2), pp. 1172–1184. Cited by: §1, §2.
  • [5] C. Bron and J. Kerbosch (1973) Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM 16 (9), pp. 575–577. Cited by: §5.2, §5.2.
  • [6] F. Cazals and C. Karande (2008) A note on the problem of reporting maximal cliques. Theoretical Computer Science 407 (1-3), pp. 564–568. Cited by: §5.2, §5.2.
  • [7] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield (2005) Live migration of virtual machines. In Proceedings of the 2nd conference on Symposium on Networked Systems Design and Implementation-Volume 2, pp. 273–286. Cited by: §1, §3.2, §3.2.
  • [8] (accessed 22 Jan 2020) Container migration with Podman on RHEL. External Links: Link Cited by: §1.
  • [9] (accessed 22 Feb 2020) CRIU. Note: https://criu.org/Iterative_migration2020 Cited by: §3.2.
  • [10] R. Cziva, C. Anagnostopoulos, and D. P. Pezaros (2018) Dynamic, latency-optimal vnf placement at the network edge. In Proceedings of IEEE Conference on Computer Communications (INFOCOM), pp. 693–701. Cited by: §1.
  • [11] (accessed 29 June 2021) Dynamic resource management in E2 VMs. External Links: Link Cited by: §1.
  • [12] D. Eppstein, M. Löffler, and D. Strash (2010) Listing all maximal cliques in sparse graphs in near-optimal time. In Proceedings of International Symposium on Algorithms and Computation, pp. 403–414. Cited by: §5.2, §5.4, §5.4.
  • [13] S. Ghorbani and M. Caesar (2012) Walk the line: consistent network updates with bandwidth guarantees. In Proceedings of the first Workshop on Hot Topics in Software Defined Networks, pp. 67–72. External Links: ISBN 1450314775 Cited by: §1, §1, §2.
  • [14] T. He, A. N. Toosi, and R. Buyya (2021) SLA-aware multiple migration planning and scheduling in sdn-nfv-enabled clouds. Journal of Systems and Software 176, pp. 110943. External Links: ISSN 0164-1212 Cited by: §1, §1, §2, §3.4, §6.1.1, §6.1, §6, §6.
  • [15] T. He, A. N. Toosi, and R. Buyya (2019) Performance evaluation of live virtual machine migration in sdn-enabled cloud data centers. Journal of Parallel and Distributed Computing 131 (), pp. 55–68. External Links: ISSN 0743-7315, Document Cited by: §1, §1, §2, §3.2, §3.2, §3.2, §3.3.
  • [16] Y. Jiang, J. Wang, J. Shi, J. Zhu, and L. Teng (2020)

    Network-aware virtual machine migration based on gene aggregation genetic algorithm

    .
    Mobile Networks and Applications 25, pp. 1457––1468. Cited by: §1, §2.
  • [17] C. Jo, Y. Cho, and B. Egger (2017)

    A machine learning approach to live migration modeling

    .
    In Proceedings of the 2017 Symposium on Cloud Computing, pp. 351–364. Cited by: §1, §2.
  • [18] B. Lantz, B. Heller, and N. McKeown (2010) A network in a laptop: rapid prototyping for software-defined networks. In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, pp. 1–6. Cited by: §3.4.
  • [19] E. L. Lawler, J. K. Lenstra, and A. Rinnooy Kan (1980) Generating all maximal independent sets: np-hardness and polynomial-time algorithms. SIAM Journal on Computing 9 (3), pp. 558–565. Cited by: §5.2.
  • [20] D. R. Lick and A. T. White (1970) K-degenerate graphs. Canadian Journal of Mathematics 22 (5), pp. 1082–1096. Cited by: §5.2.
  • [21] V. Mann, A. Gupta, P. Dutta, A. Vishnoi, P. Bhattacharya, R. Poddar, and A. Iyer (2012) Remedy: network-aware steady state vm management for data centers. In Proceedings of International Conference on Research in Networking, pp. 190–204. Cited by: §1, §1, §2, §2.
  • [22] V. Marmol and A. Tucker (2018) Task migration at scale using criu. In Linux Plumbers Conference, Cited by: §1.
  • [23] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner (2008) OpenFlow: enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review 38 (2), pp. 69–74. Cited by: §3.1.
  • [24] K. Park and V. S. Pai (2006) CoMon: a mostly-scalable monitoring system for planetlab. ACM SIGOPS Operating Systems Review 40 (1), pp. 65–74. Cited by: §6.3.1, §6.3, §6.
  • [25] Q. Peng, Y. Xia, Z. Feng, J. Lee, C. Wu, X. Luo, W. Zheng, S. Pang, H. Liu, Y. Qin, et al. (2019) Mobility-aware and migration-enabled online edge user allocation in mobile edge computing. In Proceedings of 2019 IEEE International Conference on Web Services (ICWS), pp. 91–98. Cited by: §1, §2.
  • [26] A. Ruprecht, D. Jones, D. Shiraev, G. Harmon, M. Spivak, M. Krebs, M. Baker-Harvey, and T. Sanderson (2018) Vm live migration at scale. ACM SIGPLAN Notices 53 (3), pp. 45–56. Cited by: §1.
  • [27] A. Singh, M. Korupolu, and D. Mohapatra (2008) Server-storage virtualization: integration and load balancing in data centers. In Proceedings of ACM/IEEE conference on Supercomputing, pp. 1–12. Cited by: TABLE I, §2, §2.
  • [28] J. Son and R. Buyya (2018-05) A taxonomy of software-defined networking (sdn)-enabled cloud computing. ACM Comput. Surv. 51 (3), pp. 59:1–59:36. External Links: ISSN 0360-0300, Document Cited by: §3.1.
  • [29] J. Son, A. V. Dastjerdi, R. N. Calheiros, X. Ji, Y. Yoon, and R. Buyya (2015) Cloudsimsdn: modeling and simulation of software-defined cloud data centers. In Proceedings of IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 475–484. Cited by: §6.
  • [30] J. Son, T. He, and R. Buyya (2019) CloudSimSDN-nfv: modeling and simulation of network function virtualization and service function chaining in edge computing environments. Software: Practice and Experience 49 (12), pp. 1748–1764. Cited by: §6.
  • [31] E. Tomita, A. Tanaka, and H. Takahashi (2006) The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical computer Science 363 (1), pp. 28–42. Cited by: §5.2, §5.2.
  • [32] K. Tsakalozos, V. Verroios, M. Roussopoulos, and A. Delis (2017) Live vm migration under time-constraints in share-nothing iaas-clouds. IEEE Transactions on Parallel and Distributed Systems 28 (8), pp. 2285–2298. Cited by: §3.4.
  • [33] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes (2015) Large-scale cluster management at google with borg. In Proceedings of the Tenth European Conference on Computer Systems, pp. 1–17. Cited by: §1.
  • [34] A. Verma, P. Ahuja, and A. Neogi (2008) PMapper: power and migration cost aware application placement in virtualized systems. In Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware (Middleware), pp. 243–264. Cited by: §1, §1, §1, TABLE I, §2, §2, §6.1.
  • [35] H. Wang, Y. Li, Y. Zhang, and D. Jin (2019) Virtual machine migration planning in software-defined networks. IEEE Transactions on Cloud Computing 7 (4), pp. 1168–1182. Cited by: §1, §1, §1, §2, §3.4.
  • [36] T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif (2009) Sandpiper: black-box and gray-box resource management for virtual machines. Computer Networks 53 (17), pp. 2923–2938. Cited by: §1, §1, §1, TABLE I, §2, §2, §6.1.1, §6.1.
  • [37] Q. Wu, F. Ishikawa, Q. Zhu, and Y. Xia (2016) Energy and migration cost-aware dynamic virtual machine consolidation in heterogeneous cloud datacenters. IEEE Transactions on Services Computing 12 (4), pp. 550–563. Cited by: §1, §2, §2.
  • [38] Z. Xiao, W. Song, and Q. Chen (2012) Dynamic resource allocation using virtual machines for cloud computing environment. IEEE Transactions on Parallel and Distributed Systems 24 (6), pp. 1107–1117. Cited by: TABLE I, §2.
  • [39] F. Xu, F. Liu, L. Liu, H. Jin, B. Li, and B. Li (2014) Iaware: making live migration of virtual machines interference-aware in the cloud. IEEE Transaction on Computers 63 (12), pp. 3012–3025. Cited by: §1, §1, TABLE I, §2, §2, §6.1.