Sec. I Introduction
5G networks are envisioned to support a variety of services belonging to vertical industries (e.g., autonomous driving, media, and entertainment) with a diverse set of requirements. Services are defined as a directed graph of virtual network functions (VNFs) with specific and varying key performance indicators (KPIs), e.g., throughput, and delay. Requests for these services arrive over time and mobile network operators (MNOs) are responsible for efficiently satisfy such a demand, by fulfilling their associated KPI while minimizing the cost for themselves.
As a result of the softwarization of 5G-and-beyond networks, enabled by software-defined networking (SDN) and network function virtualization (NFV), it is now feasible to use general-purpose resources (e.g., virtual machines) to implement the VNFs required by the different service. The decision on which resources to associate with which VNF and service is made by a network component called orchestrator, as standardized by ETSI . Without loss of generalityi, we focus only on computational and communication resources (e.g., virtual machines and the links connecting them); notice, however, that our proposed framework is applicable to other resource types (e.g., storage).
The network orchestrator makes the following decisions :
admission of requests;
activation/deactivation of VMs;
placement of VNF instances therein;
assignment of CPU to VMs for running the hosted VNF instances;
routing of traffic through physical links.
These decisions are clearly mutually dependent, and therefore should be made jointly, in order to account for the – often nontrivial – ways in which they influence one another. The focus of this paper is thus to consider the joint requests admission, VM activation/deactivation, VNF placement, CPU assignment, and traffic routing problem in order to maximize the MNO profit, while considering:
the properties of each VNF,
the KPI requirements of each service,
the capabilities of VMs and PoPs (points of presence, e.g., datacenters) and their latency,
the capacity and latency of physical links,
the VMs setup times,
the arrival and departure times of service requests.
As better discussed in Sec. II, some of these factors are simplified, or even neglected, in existing works on 5G orchestration. Notably, we account for the VM setup time, which becomes a significant factor in (for example) IoT applications, when requests are often short-lived. Ignoring setup (and tear-down) times can reduce the optimality of existing solutions.
Furthermore, we account for the fact that different VNFs may have different levels of complexity, therefore, different quantities of computational resources may be needed to attain the same KPI target. Inspired by several works in the literature , we model individual VNFs as queues and services as queuing networks. Critically, unlike traditional queuing networks, the quantity of traffic (i.e., the number of clients in queues) can change across queues, as VNFs can drop some packets (e.g., firewalls) or change the quantity thereof (e.g., video transcoders). Our model accounts for this important aspect by replacing traditional flow conservation constraints with a generalized flow conservation law, allowing us to describe arbitrary services with arbitrary VNF graphs.
Given this model, we formulate a one-shot optimization problem which, assuming perfect knowledge of future requests, allows us to maximize the MNO profit. Given the NP-hardness of such a problem and the fact that knowledge of future requests is usually not available, we propose MaxSR, an efficient heuristic algorithm which will be invoked periodically based on the knowledge of requests within each time period. The proposed method can achieve a near-optimal solution for large-scale network scenarios. We evaluate MaxSR compared to the optimum and other benchmarks using real-world services and different network scenarios.
In summary, the main contributions of this paper are as follows:
we propose a complete model for the main components of 5G, both in terms of vertical services (dynamic requests, VNFs, and services KPIs) and in terms of resources (e.g. VMs and links);
our model accounts for the time variations of service requests, and dynamically allocates the computational and network resources while considering VMs setup times. It can also accommodate a diverse set of VNFs in terms of computational complexity and KPI requirements, multiple VNF instances, and arbitrary VNF graphs with several ingress and egress VNFs, rather than a simple chain or directed acyclic graph (DAG);
we formulate a one-shot optimization problem as a Mixed-Integer Programming (MIP) to make a joint decision on VM state, VNF placement, CPU assignment, and traffic routing based on the complete requests statistics over the entire system lifespan;
we propose MaxSR, an efficient near-optimal heuristic algorithm to solve the aforementioned problem based on the knowledge of the near future for large scale network scenarios;
finally, we compare MaxSR with optimum and the online approach Best-Fit, through extensive experiments using synthetic services and requests, and different network scenarios.
The rest of the paper is organized as follows. Sec. II reviews related works. Sec. III describes the system model and problem formulation, while Sec. IV clarifies our solution strategy. Finally, Sec. V presents our numerical evaluation under different network scenarios, and Sec. VI concludes the paper.
Sec. II Related Work
Several works have addressed VNF placement and traffic routing, as exemplified by the survey paper 
. In most of these works, the problem is formulated as a Mixed Integer Linear Program (MILP) with a different set of objectives and constraints. Such an approach can yield exact solutions, but merely works for small instances; therefore, heuristic algorithms that offer a near-optimal solution have also been presented.
In particular, a first body of works provides a one-time VNFs placement, given the incoming service requests. Since this method leaves already placed VNFs intact, it can lead to a sub-optimal solution when the traffic varies over time. Examples of such an approach can be found in [4, 5, 6, 7, 8, 9], which aim at minimizing a cost function, e.g., operational cost, QoS degradation cost, server utilization, or a combination of them, and assume that there are always enough resources to serve the incoming requests. Among them, Cohen et al.  propose an approximation algorithm to place sets of VNFs in an optimal manner, while approximating to the constraints by a constant factor. Pham et al.  introduce a distributed solution based on a Markov approximation technique to place chains of VNFs where the cost enfolds the delay cost, in addition to the cost of traffic and server. , instead, addresses the same problem but aims at minimizing the energy consumption, given constraints on end-to-end latency for each flow and server utilization. Pei et al.  propose an online heuristic for this problem, by which VNF instances are deployed and connected using the shortest path algorithm, in order to minimize the number VNF instances and satisfy their end-to-end delay constraint.
Another thread of works focuses on an efficient admission policy that maximizes the throughput or revenue of admitted requests [10, 11, 12, 13]. In particular, Sallam et al.  formulate joint VNF placement and resource allocation problem to maximize the number of fully served flows considering the budget and capacity constraints. They leverage the sub-modularity property for a relaxed version of the problem and propose two heuristics with a constant approximation ratio.  studies the joint VNF placement and service chain embedding problem, so as to maximize the revenue from the admitted requests. A similar problem is tackled in  and  but for an online setting where the requests should be admitted and served upon their arrival. Zhou et al. , on the other hand, first formulate a one-shot optimization problem over the entire system lifespan and then leverage the primal-dual method to design an online solution with a theoretically proved upper bound on the competitive ratio.
propose algorithms that properly scale over-utilized or under-utilized VNF instances based on the estimation of future service demands. Jiaet al.  propose an online algorithm with a bounded competitive ratio that dynamically deploys delay constrained service function chains across geo-distributed datacenters minimizing operational costs.
Request admission control has instead been considered in [17, 18, 19, 20]. More in detail, Li et al.  propose a proactive algorithm that dynamically provisions resources to admit as many requests as possible with a timing guarantee. Similarly,  admits requests and places their VNFs in the peak interval, but minimizes the energy cost of VNF instances by migration and turning off empty ones in the off-peak interval. Liu et al.  envision an algorithm that maximizes the service provider’s profit by periodically admitting new requests and rearranging the current-served ones, while accounting for the operational overhead of migration. Finally, leveraging VNF migration and sharing,  proposes an online algorithm to maximize throughput while minimizing service cost and meeting latency constraints.
Relevant to our work are also studies that target specifically 5G systems, although they merely consider the link delay and neglect processing delays in the servers. An example can be found in , which models VMs as M/M/1 PS queues, and proposes a MILP and a heuristic solution to minimize the average service delay, while meeting the constraints on the links and host capacities. The works in  and  aim instead to minimize, respectively, the operational cost and the energy consumption of VMs and links while ensuring end-to-end delay KPI.  also allows for VNF sharing and studies the impact of applying priorities to different services within a shared VNF. Zhang et al.  tackle the request admission problem to maximize the total throughput, neglecting instead queuing delay at VMs.
We remark that most of the above works present proactive approaches, and only deal with either cost minimization or request admission. On the contrary, we focus on dynamic resource activation, VNF placement, and CPU assignment to maximize the revenue from admitted requests over the entire system lifespan, while minimizing the deployment costs and accounting for some practical issues. Our proactive MILP formulation of the problem extends existing models by accounting for the maximum end-to-end delay as the main KPI, while our heuristic is a practical and scalable solution, which periodically admits new requests and readjusts the existing VNF deployment. To the best of our knowledge, this is the first dynamic solution for service orchestration in 5G networks.
Sec. III System Model and Problem Formulation
In this section, first we describe our system model supported by a simple example. Later, we formulate the joint requests admission, VM activation, VNF placement, CPU assignment, and traffic routing problem; a discussion of the problem time complexity follows. The frequently used notation is summarized in Table I.
Iii-a System Model
Physical infrastructure. Let be a directed graph representing the physical infrastructure network, where each node is either a VM or a network node (i.e., a router or a switch). A VM has maximum computational capacity . Set denotes the physical links connecting the network nodes. We define and as, respectively, the bandwidth and delay of physical link . Time is discretized into steps, , and we assume that at every time step a VM may be in one of the following states: terminated, turning-on, or active. Specifically, VMs can only be used when they are active, and they need to be turned-on one time step before being active. Based on the measurements reported in , we also consider the traffic flow migration time to be negligible with respect to the VM setup time.
Each VM can host one VNF and belongs to a datacenter ; we denote the available amount of computational resources in datacenter by and the set of VMs within with . In the physical graph , physical links within datacenters are assumed to be ideal, i.e., they have no capacity limit and zero delay. Let logical link be a sequence of physical links connecting two VMs, and destination , then we define end-to-end path as a sequence of logical links.
Services. We represent each service with a VNF Forwarding Graph (VNFFG), where the nodes are VNFs , and the directed edges show how traffic traverses the VNFs. VNFFG can be any general graph with possibly several ingress and egress VNFs. We denote the total new traffic, entering the ingress VNFs of service , by . A traffic packet of service , processed in VNF , is forwarded to VNF with probability of . Similarly, is the probability that a new traffic packet of service starts getting service in ingress VNF , and is the probability that a traffic packet of service , already served at egress VNF , departs service . For each service , we consider its target delay, , as the most critical KPI, specifying the maximum tolerable end-to-end delay for the traffic packets of .
VNFs can have different processing requirements depending on their computational complexity. We denote by the computational capability that VNF needs to process one unit of traffic. Some VNFs may not find sufficient resources on a single VM to completely serve the traffic while satisfying the target delay. Thus, multiple instances can be created, with being the maximum number of instances of VNF at each point in time. Instances of the same VNF can be deployed either within the same datacenter or at different datacenters; in the latter case, the traffic between each pair of VNFs must be splitted through different logical links that connect the VMs running the corresponding VNF instances.
Different requests for the same services may arrive over time; we denote with the set of all service requests for service , and characterize the generic service request with its arrival time and departure time . Due to slice isolation requirements , we assume that the VNF instances of different service requests are not shared with other service requests.
Example. Fig. 1 represents a possible deployment of two sample services, vehicle collision detection (VCD) and video on-demand (VoD), on the physical graph (Fig. (c)c) in a single time step. VCD is a low-latency service with a very low target delay , and VoD is a traffic intensive service with a high . Fig. (a)a and Fig. (b)b depict the VNFFGs of the VCD and VoD services, respectively, where the numbers on the edges represent the transition probability of traffic packets between corresponding VNFs. The physical graph contains a set of datacenters with computational capability . Datacenters are connected to each other using a switch and physical links with bandwidth and a latency . VMs within each datacenter are denoted by sets , , and , each with computational capability . As depicted in Fig. (c)c, service VCD is deployed within datacenter to avoid inter-datacenter network latency. Service VoD is deployed across datacenter and third-party datacenter . VNF transcoder, having high computational complexity , requires two instances in datacenters to fully serve the traffic.
Iii-B Problem Formulation
In this section, we first describe the decisions that have to be made to map the service requests onto network resources. Then we formalize the system constraints and the objective using the model presented in Sec. III-A, along with the decision variables we define. In general, given the knowledge of the future arrival and departure times of service requests, we should make the following decisions:
service request activation, i.e., when service requests get served;
VM activation/deactivation, i.e., when VMs are set up or terminated;
VNF instance placement, i.e., which VMs have to run VNF instances;
CPU assignment, i.e., how much computational capability shall be assigned to a VM to run the deployed VNF;
traffic routing, i.e., how traffic between VNFs is routed through physical links.
Service request activation.
Let binary variabledenote whether service request is being served at time . Once admitted, a service request has to be provided for all its lifetime duration. Given service request arrival time and departure time , this translates into:
VNF instances. The following constraint limits the number of deployed instances of VNF of any service request to be less than at any point in time:
where binary variable represents whether VNF of service request is placed on VM at time . The network slice isolation property of 5G networks prevents VNF sharing among requests for different services. In addition, at most one VNF instance can be deployed on any VM, i.e.,
VM states. We define two binary variables and to represent whether VM is turning-on or active at time , respectively. We formulate a simple constraint to prevent VMs from being concurrently turning-on and active at any time, i.e.,
The following constraint enforces that VM can be active at time only if it has been turning-on or active in the previous time step:
VMs are able to run VNFs only when they are active, i.e.,
Computational capacity. Let real variable represent the service rate assigned to VM to run VNF of service request at time . Multiplying it by , we have the amount of computation capability assigned to VM to run VNF at time . The limited computational capability of datacenters and VMs denoted, respectively, by and , should not be exceeded at any point in time. We describe such a limitation by imposing:
where the sum on the left-hand side of the inequality is over all VMs within datacenter . Similarly, for the VMs we have
where on the right-hand side of the inequality enforces zero service rate for VM when no VNF is placed therein.
KPI target fulfillment. Whenever a service request is being served, i.e., , all the traffic in the corresponding VNFFG should be carried by the underlying physical links. The following constraint ensures this condition for the traffic between each pair of VNFs at any point in time:
Real variable shows the fraction of traffic from VNF to of service request that is routed through logical link at time . As mentioned, the traffic flow from VNF to VNF may be splitted into several logical links (see Eq. (2)). Moreover, since we consider multi-path routing, there may be multiple logical links between each pair of VNF instances. Therefore, constraint (9) implies that for any service request requesting traffic from VNF to (i.e., ), the sum of all fractional traffic going though any logical link, should be equal to at any time when the service request is being served.
The above constraint does not include ingress and egress traffic. To account for such contributions, we need to introduce dummy nodes in the VNFFG and the physical graph. We add an end-point dummy VNF, in every VNFFG, which is directly connected to all ingress and egress VNFs and a dummy VM in the physical graph which is directly connected to all VMs. We define as the set of dummy logical links which start from or end at the dummy VM. We assume that dummy logical links are ideal, i.e., they have no capacity limit and zero delay and cost. We can now formulate the associated traffic constraints as:
where and are the fraction of new traffic entering ingress VNF and the fraction of traffic departing from egress VNF , respectively, going through logical link at time .
Placement. We can now correlate the routing decisions and the placement decisions as
The above constraint implies that whenever there is an incoming traffic to VNF through logical link whose destination is VM , i.e., , VNF is deployed at VM . Similarly, whenever there is an outgoing traffic from VNF through logical link whose source is VM , i.e., , VNF is deployed at VM :
System stability. Let denote the total incoming traffic of VNF of service . equals the sum of ingress traffic and the traffic coming from other VNFs to VNF of service :
Using , the amount of traffic from VNF to VNF of service can be represented as:
We can now define an auxiliary variable to represent the incoming traffic of VNF of service request , which enters VM at time :
where the summation is over all logical links ending at VM . Finally, we describe the system stability requirement, which imposes the incoming traffic not to exceed the assigned service rate for each VNF of service request on VM , at any point in time:
Generalized flow conservation. Our model captures the possibility of having VNFs for which, due to processing, the amount of incoming and that of outgoing traffic are different. We define the scaling factor as the ratio of outgoing traffic to incoming traffic for VNF of service :
We also define auxiliary variable to represent the outgoing traffic of VNF of service request departing VM at time :
where the right-hand side enfolds all traffic flowing through logical links starting from VM . We can then formulate the generalized flow conservation law for each VNF of service request on VM at time :
which implies that for each VNF of service request on VM , at any time, the outgoing traffic is equal to the incoming traffic multiplied by the scaling factor .
Latency. End-to-end network latency for a traffic packet of a service request is the time it takes to the packet to be served by all VNFs along the path from the ingress to the egress VNFs. Such a latency includes two contributions, namely, the network delay between pairs of VMs on which subsequent VNFs are deployed and the processing time at the VNFs themselves. The former can be defined based on the delay of the logical links , denoted by . Such a delay is the sum of the delay of the underlying physical links:
We also introduce binary variable to represent whether logical link is used for routing the traffic from VNF to of service request at time . can be described as
The traffic packets in the VNFFG follow a path of logical links in the underlying physical graph, which connect all VNFs in the VNFFG. Let be the sequence of VNFs, from an ingress VNF to an egress VNF in the VNFFG of service . The network delay of traffic packets of service request , which traverse the VNFs as specified by and go through the links belonging to , is given by:
The processing time of VM , denoted by , is the time it takes for a traffic packet to be completely processed in the VM. Modeling each VM as a queue with discipline PS (or, equivalently, FIFO), the processing time of VM at time is :
Then, the processing time incurred by the traffic packets following the VNF sequence , is given by:
Finally, the experience delay must be less than the target delay, i.e.,
Link capacity. The traffic on any physical link should not exceed the maximum link capacity, . To formalize this constraint, we define the auxiliary variable to represent the traffic on physical link at time . This variable is equal to the total traffic between each pair of VNFs which goes through the logical link containing the physical link :
The link capacity constraint is expressed as
Objective. The goal of the optimization problem is to maximize the service revenue while minimizing the total cost. The revenue obtained by serving one unit of traffic of service is indicated as ; we assume such a quantity to be inversely proportional to the target delay of service , i.e., . This implies that serving services with lower target delay yields higher revenue for the MNO. The total revenue is expressed as
The total cost is the sum of the transmission cost in physical links, computational and idle costs in VMs, which are described, respectively, as:
The above costs are expressed per unit of time and depend, respectively, on a proportional cost paid for each physical link per unit of traffic, a proportional cost for each VM paid per unit of computation, and a fixed cost for each VM paid if VM is turning-on or active. Finally, we write our objective as:
Iii-C Problem Complexity
The problem of jointly making decisions about VM activation, VNF placement, CPU assignment, and traffic routing formulated above contains both integer and real decision variables, hence it is non convex. In the following, we prove that the problem is NP hard, through a reduction from the weight constrained shortest path problem (WCSPP) to a simpler version of our own.
The problem mentioned in Sec. III-A is NP-hard when the objective value is greater than zero.
We reduce an NP-hard problem, called weight constrained shortest path problem (WCSPP) , to our problem. Given a graph , and the cost and weight associated with the edges, the WCSPP asks to find the minimum cost route between two specified nodes while ensuring that the total weight is less than a given value. We consider a special case of our problem where only one service request with a chain of two VNFs arrives at and departs in the next time step. We set the maximum number of instances for both VNFs to one. There are only two VMs in the physical infrastructure, with and ; the remaining are network nodes. We set . Then, it is easy to see that WCSPP is equivalent to the special case of our problem when the objective value is greater than zero.
Beside complexity, solving the problem formulated in Sec. III-B assumes that the entire knowledge of arrival and departure times of all service requests is available, which is not realistic in many scenarios. As detailed below, to cope with this issue, our strategy is to periodically solve our problem, with each problem instance leveraging only the information about the past and the current service requests.
Sec. IV The MaxSR Solution
In light of the problem complexity discussed above, we propose a heuristic solution called MaxSR, which makes decisions (i) only concerning a subsequent time interval encompassing the present and the near future, which can be predicted with high accuracy , (ii) based on the knowledge of the service requests occurring within such time interval. More precisely, starting from time step , MaxSR makes decisions concerning the current service requests and accounting for a time horizon , i.e., extending till . After time steps, where , MaxSR is executed again accounting for the next time interval, i.e., . Note that, although decisions are made accounting for a time horizon equal to , they will be enacted just until the next execution of MaxSR, i.e., they hold, in practice, only for . Even with such a limited time horizon, directly solving the problem defined in Sec. III-B is still NP hard. To walk around this limitation, at every execution, MaxSR processes the service requests received in the last time steps sequentially, i.e., one request at a time. In the following, we provide an overview of MaxSR in Sec. IV-A, and we detail the algorithms composing our heuristic in Sec. IV-B.
At every execution, MaxSR first considers service requests in decreasing order based on the corresponding service revenue. It then activates the necessary VMs for serving the first service request, trying to map the VNF sequence onto a path connecting the VMs deemed to host the required VNFs. While doing this, more than one instance can be created for a VNF if necessary to meet the service target delay. To this end, we associate with each VNF a delay budget, which is proportional to the VNF computational complexity . Such budget, however, is flexible, since the delay contribution of a VNF exceeding its delay budget may be compensated for by a subsequent VNF on , which is deployed in a VM able to process traffic faster than what indicated by the VNF budget. Additionally, MaxSR exploits a backtrack approach: in case of lack of sufficient resources at a certain point of current path , the algorithm can go back to the last successfully deployed VNF and looks for an alternative deployment (hence path), leaving more spare budget for subsequent VNFs. None the less, it may prove impossible to find enough resources to accommodate the traffic and delay constraint of a given VNF instance; in this case, the service request is rejected.
The decisions that MaxSR makes are summarized below.
Placement. MaxSR aims to minimize the placement cost. This implies that the number of deployed VNF instances should be low, and the selected VMs should have a low cost. The algorithm thus starts from one instance and chooses the lowest-cost VM among the available ones. If this placement is not feasible, it tries the highest capacity VM to avoid the use of an extra instance. If the latter strategy is also infeasible, it increases the number of instances and repeats the process until a successful deployment is possible, or the limit on the maximum number of instances is reached (Alg. 2 and Alg. 3).
Routing. Recall that each VNF may have several instances and that such instances may be deployed on VMs connected through multiple logical links. MaxSR adopts a water-filling approach to route the traffic between each pair of VNFs through different logical links between a pair of VMs. To limit the processing time at each VM, the traffic entering each VM is properly set based on the VM available capacity (Alg. 3).
CPU assignment. MaxSR aims to keep the service rate of the used VMs as low as possible, in order to reduce the consumption of computing resources, hence the cost. This means setting the lowest service rate compatible with the per-VNF delay budget, except when we have to compensate for a VNF exceeding its delay budget; in the latter case, the algorithm opts for the maximum service rate on the VM (Alg. 4).
Alg. 1. It is the main body of the MaxSR heuristic, taking as input time horizon , the current time step , and the set of service requests which should be served in the time horizon . Line 1 calculates service revenue for each request , based on the expected traffic to be served in the time horizon and the expected revenue, i.e., for service . The algorithm sorts the service requests in Line 1 in descending order, according to . It then calls BSRD for each request, in order to determine whether and how to serve it within the time horizon. If the request can be served, the resulting VNF placement/CPU assignment and routing decisions are stored in and in , respectively. For each served request, will then contain a tuple per each VNF instance that specifies the allocated VM and its assigned service rate, while will contain a tuple for each pair of VNF instances, determining the amount of traffic on their connecting logical link(s). Finally, the VMs required for running the service request are activated if not already active; we recall that it takes one time step to activate them (turning-on state), and they will remain up till the service departure time.
Alg. 2. Given service request for service as an input, the goal of Alg. 2 is to check whether all VNFs of can be deployed with the available resources. If it is possible, the request is served and the result sets and are returned. The global boolean variables status and can-backtrack represent the deployment status and the possibility of backtracking, respectively. status is critical if the last VNF deployment has failed, and normal otherwise. The global cache is a set of results that facilitates the backtracking operation (see Alg. 3). The algorithm starts in normal mode; clearly, backtracking is not allowed for the first VNF in the VNFFG and cache is empty (Line 2). The algorithm starts by assigning a delay budget to each VNF of the service, which is proportional to the VNF computational complexity (Line 2), where denotes the -th VNF in the VNFFG. Then, it goes across the sequence of VNFs starting from the ingress VNF and deploys them one by one.
For each VNF, Lines 2-2 decide on the number of required instances and the VM selection strategy, based on the deployment status. The strategy can be cheapest or largest: the algorithm selects VMs with the lowest cost when the strategy is cheapest, and with the highest capacity when the strategy is largest. The first part (Lines 2-2) deploys the VNF in the normal mode. Since the algorithm aims to keep the number of required VNF instances as low as possible, it starts with one instance and the cheapest strategy and calls VPTR to determine placement and routing, and CA to determine the CPU assignment. The deployment is successful if neither of these algorithms fails. If the cheapest strategy does not yield a successful deployment for the VNF, the algorithm keeps the number of instances fixed and tries the largest strategy. If both strategies fail, the number of instances is increased by one and the process is repeated. The algorithm ends whenever a successful deployment is found (Line 2), or the maximum number of instances is reached.
Lines 2-2 decide how to proceed in the VNF sequence according to the result of deployment, status and can-bakctrack. If the deployment is successful (Line 2), the algorithm updates the result set, sets status to normal and proceeds to the next VNF in the VNFFG (Line 2). can-backtrack is also updated in Line 2, which means that backtracking is allowed for the next VNFs only when we have a successful deployment in the normal mode for the current VNF: this prevents the algorithm to backtrack again to a VNF, which has already been deployed in critical mode. Otherwise (Line 2), status is set to critical and the algorithm proceeds as follows. As the first attempt, it tries to refine the placement in the previous step. Thus, if backtracking is allowed, it reverts the result sets related to the previous VNF in the VNFFG and goes back to deploy it again (Line 2). When the deployment fails but backtracking is not possible, due to a violation of the delay budget, the algorithm preserves the current deployment in the result set and proceeds to the next VNF, hoping to compensate for the exceeded delay budget (Line 2). If neither option is viable, the algorithm decides not to serve the current service request and reverts all result sets related to its deployment (Line 2).
Lines 2-2 deploy the VNF when status is critical, i.e., when the previous VNF deployment has failed. This VNF is either the next VNF in the VNFFG when the algorithm is in the backtracking phase, or the previous VNF when the algorithm is going to compensate for the exceeded delay budget by the current deployment. In either case, the algorithm chooses the fastest option to deploy the VNF, regardless of the cost, using the maximum number of instances and largest strategy. Finally, the algorithm checks the feasibility of the decisions made with regard to the datacenter capacity and service target delay after each VNF deployment in Line 2. For the former, it is enough to check that the total computational capability assigned to VMs within each datacenter does not exceed its maximum capacity, i.e., for each datacenter ,
Traffic packets belonging to a service may go through different end-to-end paths in the physical network and experience different end-to-end delays. We define as the maximum end-to-end delay that traffic packets belonging to service request experience from the ingress VNF until they depart VM which hosts an instance of VNF . Thus, after deploying VNF of service request , it is enough to check that this delay for any VM , hosting an instance of , does not exceed the service target delay:
Alg. 3. It determines the placement and traffic routing for the -th VNF of request of service , using instances and the given strategy. Line 3 initializes to the -th VNFs pair in the VNFFG of service , the routing result set to , and the remaining unserved traffic between and , i.e., , to . The first pair of VNFs is with the assumption that the dummy VNF is placed on the dummy VM. In Lines 3-3, first the remaining capacity of each logical link is calculated and stored in and then the ones that have a remaining capacity greater than zero, host VNF on their source VM, and host no VNF on their destination, are picked and stored in the set . The links in and their destination VMs are the only potential candidates for this algorithm to place instances of the -th VNF and accommodate its incoming traffic . In other words, in the rest of the algorithm, we consider the joint logical link and its destination VM as one entity and pick the best ones according to the strategy and . If the selected entities cannot fit the incoming traffic, the placement fails; none the less, we still preserve the amount of satisfied traffic in the cache and exploit this information in the backtracking phase.
The implementation speed of the backtrack operation is greatly improved by caching. Specifically, when Alg. 3 is called in the backtracking phase to refine the placement of -th VNF, the cache contains results which determine the routing of a portion of the outgoing traffic of the -th VNF to the -th VNF, which was satisfied by the previous deployment of the -th VNF in the VNFFG. Lines 3-3 exploit the cached results and accommodate the unserved portion of incoming traffic by using different instances, which helps the next deployment of the -th VNF to fully serve its traffic. For instance, assuming and that the placement of the -th VNF has failed by unserved traffic, the backtracking step will have to accommodate only traffic on extra VMs, i.e., the routing and placement results for the served traffic portion, , will not change.
The pairs of logical links and the connected VMs will be selected for placement and routing based on the given strategy. If the strategy is cheapest, they will be sorted according to the cost of the logical link plus the VM CPU cost in ascending order (Line 3). If the strategy is largest, we sort them in descending order by the minimum of the remaining capacity of the logical link and the VM (Line 3). Line 3 picks the biggest set of top logical links such that the number of unique destination VMs is equal to the number of instances, i.e., , and stores them in . Note that there may be multiple logical links with the same destination VM in this set, and therefore we should pick the largest set to increase the chance of fitting the traffic. If the number of unique destination VMs is less than , will be empty and the placement fails. Otherwise, we store destination VMs corresponding to logical links in set (Line 3).
To avoid an exceedingly high processing time, Line 3 introduces a limit for the amount of traffic entering a given VM , proportional to the VM maximum computational capacity. Notice that all logical links ending at the same destination VM have the same limit. The remaining computational capacity of each selected VM, , is initialized to its maximum (Line 3). The algorithm adopts a water-filling approach to fill the logical links in Lines 3-3. First, for each logical link and its connected VM , the remaining capacity, i.e., the minimum of the remaining capacities of and dst(l), is stored in (Line 3). Then, logical link is filled by the remaining unserved outgoing traffic of VNF on VM , so that neither limit on the capacity of logical link nor the limit on the incoming traffic of VM are violated. Line 3 updates the remaining unserved traffic from to (), the remaining capacity of logical link (), the remaining capacity of destination VM (), and routing result set (). Finally, if there is still some unserved traffic from VNF to (i.e., not all the traffic can be served), the algorithm returns fail (Lines 3-3). Line 3 preserves the satisfied outgoing traffic of VM hosting an instance of VNF , i.e., , in the cache, so as to use it later on in case of backtracking. Otherwise, the algorithm returns success with the placement result set .
Alg. 4. It is called in Line 2 and Line 2 of Alg. 2 when the deployment of VNF in Alg. 3 is successful. Given the result set , this algorithm is responsible for assigning the service rates to VMs for running the deployed instances of VNF . After initialization, in Line 4, defines the set of the logical links used for routing a part of traffic from any instance of VNF to any instance of VNF . We store the VMs on which VNF is already deployed in the set (Line 4). Then, for each , we calculate the incoming traffic through the sum of traffic from all logical links ending in VM , and store it in in Line 4.
represents the maximum end-to-end delay that traffic packets experience from the ingress VM to VM , which hosts an instance of VNF , but before being processed by . For each logical link where , this delay is equal to the sum of the maximum end-to-end delay of traffic packets after being processed by VNF on VM , i.e., , and the delay of logical link , i.e., . Taking the maximum over all such logical links, we have in Line 4.
Similar to the VNF deployment in Alg. 3, the algorithm assigns service rates to VMs based on the deployment status. In the critical mode, the algorithm aims to reduce the delay contribution, which depends on logical links delay and processing time on VMs. The logical links are already selected by the VPTR algorithm, thus here we assign the maximum possible service rate for the VM to reduce the processing time (Line 4). Instead, when the algorithm is in normal mode, it chooses the minimum possible service rates for VM (Line 4), such that the VNFs delay budget do not violate, i.e.
In the above equation, the right- and left-hand sides represent the processing time of VM and the remaining delay budget of VNFs, respectively. To compute the latter, first it is calculated the total delay budget of the VNFs up to the -th one (i.e., the current one). Then, it is subtracted by the maximum end-to-end delay of traffic packets, before being processed by VNF on VM , i.e., .
The computed service rate for VM may be invalid because (i) no delay budget is left to process the current VNF on VM , i.e., the left-hand side of equality in (36) becomes non-positive, or (ii) the assigned service rate exceeds the maximum capability of the VM. In both cases, the CA algorithm fails, however the VM is assigned to its maximum computational capability to process the VNF (Line 4). Recall that, although the CPU assignment failed for the current VNF, the algorithm keeps the results to be used in Alg. 2 (Line 2) when backtracking is not allowed. In this case, the algorithm continues with the next VNF and tries to compensate for the exceeded delay budget. Line 4 stores the results, and Line 4 updates for this VM that shows the maximum end-to-end delay after the packets are processed by VM . Finally, when all service rates have been assigned, the algorithm returns fail if the remaining delay budget is violated for at least one VM (Line 4), and success otherwise.
Iv-C Computational Complexity
The MaxSR heuristic takes the set of physical links , service requests , and their VNFFG , VMs , and logical links as inputs. Note that is considered as an input since it is computed once for all executions of MaxSR algorithm. Below, we prove that this algorithm has a worst-case polynomial complexity in terms of input parameters.
The MaxSR algorithm has a worst-case polynomial computation complexity.
First, we determine the complexity of the VPTR and CA algorithms. VPTR constructs and sorts the set in and adopts water filling to fill the logical links in , thus the total time complexity of this algorithm is . CA also has complexity, hence the total computational complexity of VPTR and CA remains equal to that of VPTR. Alg. 1 sorts the service requests in and calls BSRD for each service request. In the worst-case, BSRD tries every possible number of instances and strategies for all VNFs in the VNFFG of the given service request. Let and be upper bounds on the maximum number of instances, i.e. , and the number of VNFs in a VNFFG, i.e. , respectively. Thus, the total time complexity of BSRD is and total time complexity of Alg. 1 is . Therefore, the worst-case total time complexity is polynomial in terms of input parameters. In other words, the complexity of the heuristic depends primarily on the number of service requests, the number of VNFs in each VNFFG, the number of deployment attempts for each VNF, and the number of logical links.
Sec. V Numerical Results
We now present the results of the numerical experiments we conducted, and show that our proposed scheme consistently performs better than state-of-the-art approaches and close to the optimum. We compare our heuristic algorithm against the following benchmarks:
Global optimum. The solution of the optimization problem defined in Sec. III-B obtained by brute-force search, assuming exact knowledge of arrival and departure times of all service requests.
Best-fit. It is an online algorithm which decides about each service request upon its arrival, without any information about the future service requests. Best-Fit deploys VNFs of a service request one by one, using a single instance of each VNF and the cheapest strategy. If the request can be served, the selected resources will be dedicated to the service request until its departure.
In our performance evaluation, we use the following performance metrics:
Service revenue, defined as the sum of revenues achieved by serving service requests. For a single request of service , this metric equals the amount of served traffic multiplied by .
Cost/traffic, which reflects the average cost incurred to serve a unit of traffic.
In the following, we first consider a small-scale network scenario, for which the optimum solution can be obtained in a reasonable time. This scenario will give interesting and easy-to-interpret insights regarding how each service type impacts the revenue and cost/traffic ratio. Then, we run MaxSR and Best-Fit in a large-scale real network scenario, where achieving the optimum solution is impractical. Table II summarizes the services we consider for our performance evaluation, inspired to real-world 5G applications. The revenue gained from serving one unit of traffic of service , i.e.,
, is set inversely proportional to the service target delays. We assume that the service requests arrive according to a Poisson process, and the duration of requests follows an exponential distribution.