Today’s networks offer an unprecedented level of resource virtualization, available as a continuum from the edge to the cloud [FollowMe_J, MoveWithMe, Justifies_path_to_root_n_CLP_vehs, SFC_mig, Justify_CLP_tree_ISPs, micado_orchestrator, orch_cloud2edge_survey]. These virtual resources are embodied as a collection of datacenters that host service function chains. These service chains provide a plethora of applications, including infotainment [MoveWithMe], road safety [Justifies_path_to_root_n_CLP_vehs] and virtual network functions [Justify_CLP_tree_ISPs, CPVNF_proactive_place_in_CDN, mig_or_reinstall, APSR]. These applications have versatile service requirements; for instance, a road safety application requires low latency, which may dictate processing it in an edge datacenter, close to the user. On the other hand, infotainment tasks are more computation-intensive but less latency-sensitive, and therefore may be offloaded to the cloud, where computation resources are abundant and cheap [tong2016, SFC_mig]. Deploying service function chains is even more challenging when dynamic traffic conditions exist and/or some of the users are mobile. In such cases, service chains may need to be migrated in order to follow the mobile user and, thus, reduce latency [FollowMe_J, MoveWithMe, mig_correlated_VMs]. However, when the system is highly-loaded, there may not be enough available resources in the migration’s destination. Hence, providing reliable service may compel using some over-provisioning, or resource augmentation – at the cost of increasing the system’s capital expenses. Existing schemes [CPVNF_proactive_place_in_CDN, MoveWithMe, mig_correlated_VMs, Companion_Fog, dynamic_sched_and_reconf_t, SFC_mig, Dynamic_SFC_by_rtng_Dijkstra, mig_or_reinstall, Avatar] perform well when the system load is not too high, but fail to provide a feasible solution under a high load of service requests. To the best of our knowledge, no previous work provides guarantees of finding a feasible solution for the problem whenever such a solution exists. In this work, we study the combined service Deployment and Migration Problem (DMP) in a multi-tier network, where the service orchestrator [micado_orchestrator] has to decide: (i) where to deploy a service chain across the cloud-edge continuum, (ii) which resources to allocate for each part of every service chain, and (iii) which chains to migrate, and to which datacenter, to fulfill the service requirements while minimizing the overall deployment and migration costs. Our main contributions are as follows:
We first formalize the DMP, and show that even finding a feasible solution to the problem – regardless of its cost – is NP-hard.
We take latency as the main Key Performance Indicator (KPI) [Okpi], as specified by the Service Level Agreement (SLA), and show how to calculate the minimal amount of CPU resources required for placing every service chain on any datacenter, while satisfying the latency requirements.
We develop a placement algorithm that, leveraging some bounded amount of resource augmentation, is guaranteed to provide a feasible solution whenever such a solution exists for the case with no resource augmentation.
We present an algorithm that, given a feasible solution, greedily decreases its cost, while keeping the required resource augmentation minimal.
We compare the performance of our proposed solution to those of existing alternatives using two large-scale vehicular scenarios and real-world antenna locations. Our results show that our algorithm can provide a feasible solution using half the computing resources required by existing alternatives. Our evaluation further highlights several system trade-offs, such as the preferred decision period between subsequent runs of the algorithm.
The rest of the paper is organized as follows. After introducing the system model in Sec. II, we formalize the optimal deployment and migration problem in Sec. LABEL:sec:problem, and overview our solution concept in Sec. LABEL:sec:alg_concept. The problem is decomposed into a computational resource allocation problem, studied and solved in Sec. LABEL:sec:alloc, and a placement problem, characterized and solved in Sec. LABEL:sec:bu. Our overall algorithmic solution is described in Sec. LABEL:sec:top_lvl and its performance is assessed in Sec. LABEL:Sec:sim. Finally, Sec. LABEL:sec:related discusses related work, and Sec. LABEL:sec:conc draws some conclusions.
Ii Modeling the edge-cloud architecture
This section introduces the model for the network infrastructure and the services offered to mobile users and describes how we compute the service delay.
Ii-a Network model
We consider a fat-tree edge-cloud hierarchical network architecture. As described in [tong2016], the network comprises: (i) datacenters(denoting generic computing resources), (ii) switches(generic switching nodes, as routers, switches, multiple switches associated with Multi-Chassis Link-Aggregation (MCLA) [mcla]), and (iii) radio Points of Access (PoA). Datacenters are connected through switches, and PoAs may have a co-located datacenter [Mig_in_Mobile_Edge_Clouds]. Each user is connected to the network through a PoA, which may vary as the user moves. An example of such a system is depicted in Fig. 1. We denote by the set of datacenters, and model the logical multi-tier network as a directed graph where the vertices are the datacenters, while the edges are the directed virtual links connecting them, i.e., with . Let denote the diameter of , and the root of the fat tree topology. For any two datacenters , denotes the directed path from to , with referring to a sequence of physical links, or vertices, depending on the context. We consider that such a path is loop-free and uniquely predetermined between any two vertices.
Ii-B Services and chain deployment
Consider a generic user generating a service request , originating at the PoA to which the user is currently connected. Each service request is addressed through an instance of VNF chains, where each VNF is deployed on a dedicated virtual machine (VM) or container in a datacenter. For the convenience of presentation, hereinafter we refer to VMs only. We refer to the instance of the chain for service request as , where indicates the number of VMs in . Let denote the set of service requests, and the set of corresponding chains that are currently deployed, or need to be deployed, in the network. Furthermore, for every subset of requests , we let denote the subset of chains corresponding to . For simplicity of notation, while referring to VMs and datacenters hosting them, we will drop superscripts and subscripts whenever clear from the context. To successfully serve chain , the chain should be fully deployed on one of the datacenters on the path from its PoA to root [Justifies_path_to_root_n_CLP_IEICE, Justifies_path_to_root_n_CLP_vehs, Justify_CLP_tree_ISPs]. We denote that path by . Distinct deployment decisions incur distinct costs that we detail in Sec. LABEL:sec:problem. Each service is associated with an SLA, which specifies the requirements in terms of KPIs [Okpi], and with a maximum amount of resources, e.g., for which the user is willing to pay the network provider. We consider latency as the most relevant KPI, although our model could be extended to others, like throughput and energy consumption. We thus associate with each chain a target delay .
Ii-C Service delay
The service delay comprises the computational and the network delays, as detailed below. Computational delay. Given chain , each VM has some input traffic load, , expressed in bit/s, which is known a-priori [mig_correlated_VMs, SFC_mig]. In particular, denotes the input traffic to the chain at the PoA associated with the request. We let represent the processing capacity required to handle a single unit of traffic corresponding to , expressed in CPU cycles/bit. Thus, represents the CPU cycles/s required to process the incoming traffic. The computation is defined in terms of single data units to be processed. Let be the number of bits per data unit, then is the number of CPU cycles required to process each data unit111As an example, a single data unit could be a video frame to process in an object-recognition VNF. Different image resolutions will result in different values of and . In a DPI application, a single data unit would be instead a single data packet.. Let chain be placed on datacenter . For each VM , denotes the processing capacity allocated to such VM on , expressed in number of CPU cycles/s. As often done in the literature [Okpi, Joint_VNF_placement_n_cpu_alloc_Carla_Francesco, OsS_aware_VNF_placement_using_MM1_model, mm1jsac], CPU processing at the VM is modeled through an M/M/1 queue. The average computational delay at VM to process one data unit is given by long
short,where we must have . The overall computational delay of chain is thus given by: