Online VNF Chaining and Predictive Scheduling: Optimality and Trade-offs

08/01/2020
by   Xi Huang, et al.
The Chinese University of Hong Kong
0

For NFV systems, the key design space includes the function chaining for network requests and resource scheduling for servers. The problem is challenging since NFV systems usually require multiple (often conflicting) design objectives and the computational efficiency of real-time decision making with limited information. Furthermore, the benefits of predictive scheduling to NFV systems still remain unexplored. In this paper, we propose POSCARS, an efficient predictive and online service chaining and resource scheduling scheme that achieves tunable trade-offs among various system metrics with queue stability guarantee. Through a careful choice of granularity in system modeling, we acquire a better understanding of the trade-offs in our design space. By a non-trivial transformation, we decouple the complex optimization problem into a series of online sub-problems to achieve the optimality with only limited information. By employing randomized load balancing techniques, we propose three variants of POSCARS to reduce the overheads of decision making. Theoretical analysis and simulations show that POSCARS and its variants require only mild-value of future information to achieve near-optimal system cost with an ultra-low request response time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

08/01/2020

POTUS: Predictive Online Tuple Scheduling for Data Stream Processing Systems

Most online service providers deploy their own data stream processing sy...
01/16/2013

Airport Gate Scheduling for Passengers, Aircraft, and Operation

Passengers' experience is becoming a key metric to evaluate the air tran...
04/27/2015

Further Connections Between Contract-Scheduling and Ray-Searching Problems

This paper addresses two classes of different, yet interrelated optimiza...
08/01/2020

Predictive Switch-Controller Association and Control Devolution for SDN Systems

For software-defined networking (SDN) systems, to enhance the scalabilit...
08/01/2020

Online User-AP Association with Predictive Scheduling in Wireless Caching Networks

For wireless caching networks, the scheme design for content delivery is...
11/18/2019

GPT Conjecture: Understanding the Trade-offs between Granularity, Performance and Timeliness in Control-Flow Integrity

Performance/security trade-off is widely noticed in CFI research, howeve...
12/10/2019

Form + Function: Optimizing Aesthetic Product Design via Adaptive, Geometrized Preference Elicitation

Visual design is critical to product success, and the subject of intensi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Network function virtualization (NFV) is shifting the way of network service deployment and delivery by virtualizing and scaling network functions (NFs) on commodity servers in an on-demand fashion [33]. As a revolutionary technique, NFV paves the way for operators towards better manageability and quality-of-service of network services.

In NFV systems, each network service is implemented as an ordered chain of virtual network functions (VNFs) that are deployed on commodity servers, a.k.a. a service chain. Along the chain, every VNF performs some particular treatment on the received requests, then hands over the output to the next VNF in a pipeline fashion. To enable a network service, one needs to place, activate, and chain VNFs deployed on various servers. Considering the high cost of VNF migration and instantiation [24], VNF replacement can only be performed infrequently; that being said, when it comes to flow-scale or request-scale operations, function placement can be viewed as a static operation. Given this fact, a natural practice is to place multiple VNFs in one server in advance, but due to hardware resource constraints (e.g., CPU, memory, and storage)[19], a server must carefully schedule resources among a subset of such VNFs at a particular time (i.e., only a subset of VNF instances can be activated on a server at a particular time). Therefore, with a fixed VNF placement, the activation and chaining of VNFs refer to: 1) for each server, the resource allocation to a subset of deployed VNFs subject to resource constraints; and 2) for each network service, the selection of the activated instances for its VNFs, so as to determine the sequence of instances that the requests will be treated through, a.k.a. service chaining.

Given that VNF placement is considered static at the time scale of flow or request operations[27], for service chaining and resource scheduling, a natural question is: should they also be static, or dynamic? Static schemes have been implemented in some scenarios[18], but often times request traffic is highly fluctuating in both temporal and spatial dimensions[25]. In such cases, static schemes may lead to workload imbalance among instances, leaving some instances overloaded and others under-utilized. Hence, there is a huge demand to design an efficient and dynamic scheme that performs service chaining and resource scheduling, which adapts to traffic variations and achieves load balancing in real time. As for implementability, recent advances (e.g., temporal and spatial processor sharing [26]) have enabled real-time adjustment of resource allocation among various functions on the same server.

However, such dynamic design is non-trivial, especially in face of the complex interplay between successive VNFs and the resource contention among VNF instances on servers. In particular, we would like to address the following challenges:

Fig. 1: The system evolution in time slot under different service chaining decisions, with or without pre-service. Basic settings: There is one network service with two VNFs, i.e., VNF and VNF . VNF has one instance, while VNF has two instances. Every instance maintains one queue to buffer untreated requests. All the instances are readily deployed, with VNF ’s instance on server I, while the instances of VNF on server II and III, respectively. VNF ’s instance is potentially connected to both instances of VNF . Initial state: one new request has arrived at time and another one will arrive at time . Besides, VNF ’s instance has one request that has been processed in time and to be sent to one of VNF ’s instances in time .
  1. Characterization of the tunable trade-offs among various performance metrics: NFV systems often have multiple optimization objectives, e.g., maximizing resource utilization, minimizing energy consumption, and reducing request response time. Different stakeholders may have different preferences over these objectives which often times conflict with each other[43]. It is important to characterize their trade-offs to acquire a comprehensive understanding of our design space and tune the system towards the particular state that we desire.

  2. Efficient online decision making: VNF request processing often requires low latency and high throughput. Hence, an effective dynamic scheme must also be computationally efficient, and can be adaptive to request changes. This is challenging not only because of the nature of the high complexity, but also that service requests arrive in an online manner, while the underlying traffic statistics are often unknown a priori. All these uncertainties make it more challenging to optimize system objectives through a series of online decisions, not to mention that a distributed design is often preferred.

  3. Understanding benefits of predictive scheduling:

    A natural optimization of online decision making is to consider leveraging recently developed machine learning techniques

    [6][46] to predict future traffic information to reduce response time and improve quality-of-service. There is no free lunch, though. For example, in NFV-based multimedia delivery systems, multimedia service providers can predict potential requests based on the popularity of streaming contents and subscribers’ preferences [7]. Based on such predictions, service providers can carry out pre-rendering or compression to optimize the quality of their services with faster responses[5]. Despite the wide adoption of such prediction-based approaches [62][36][22][23][15], it still remains open what are the fundamental benefits of predictive scheduling to NFV systems, even in the presence of prediction errors. Answers to the questions are the key to understand whether the endeavor worthy to put on predictive VNF scheduling, and whether one can tolerate the worst possible case that may occur.

Despite recent headway on VNF scheduling[19][18], as far as we are aware, there is still no fundamental understanding on the above questions, nor is there any strategy that can achieve the design objectives simultaneously in a fully online fashion. One important reason is in the difficulty of problem formulation and modeling, especially in choosing the granularity. If one models the system state and strategy in flow-level abstraction[35], it may fall short in accurate characterization of interplay between successive VNF instances and system dynamics over time; however, if one applies fine-grained control to each request[62], then the decision making will inevitably incur a rather high computational overhead. Such issue not only prohibits a deep understanding on system dynamics, but also prevents us from obtaining efficient and accurate strategy design.

In this paper, we overcome such difficulties by applying a number of novel techniques. Our contributions include:

Modeling and formulation: We propose a novel model that separates the granularity of system state characterization and strategy making. In particular, we develop a queuing model at the request granularity to characterize system dynamics. Unlike flow-level abstraction, our model require no prior knowledge on underlying flows, but accurately captures the interplay between successive instances, i.e., real-time dynamics of how requests are received, processed, and forwarded. As for strategy making, it is conducted at the granularity of request batch in a per-time-slot fashion to avoid the high overheads of per-request optimization. Such a careful choice makes it possible to characterize the system dynamics and performance in a clear yet accurate way.

Algorithm design: To enable online and efficient decision making, we transform the long-term stochastic optimization problem into a series of sub-problems over time slots. By exploiting their unique structure, we propose POSCARS, a Predictive Online Service Chaining And Resource Scheduling scheme. Particularly, POSCARS includes two coupled parts. One is for the predictive scheduling of requests, while the other is for service chaining and resource allocation. The former part takes advantage of predicted information to effectively reduce request delays. Meanwhile, the latter part can incur a near-optimal system cost while stabilizing all queues in the system. Furthermore, it can also achieve a tunable control between system cost optimization and queue stability.

Predictive scheduling: To the best of our knowledge, this paper is the first to address the dynamic service chaining and scheduling problem in NFV system by jointly considering resource utilization, energy efficiency, and request latency. This paper is also the first to study the fundamental benefits of predictive scheduling with future information in NFV system, which extends a new dimension for NFV system design.

Experiment verification and investigation: We conduct trace-driven simulations and results show the effectiveness of POSCARS and its variants under various settings against baseline schemes, as well as the benefits of predictive scheduling in achieving ultra-low request response time.

The rest of this paper is organized as follows. In Section II, we show a motivating example of predictive scheduling in NFV systems. Section III presents our model and formulation, followed by the design and performance analysis of POSCARS and its variants in Section IV. We show simulation results and analysis in Section V, then review related work in Section VI. Finally, Section VII concludes the paper.

Ii Motivating Example

In this section, we first show a motivating example that exhibits the potential trade-off in the multi-objective optimization for different system metrics, including reduction in energy cost and communication cost, as well as shortening response times, which is mainly due to queueing delay. Besides, the example also explores the value of future information and the potential benefit of predictive scheduling.

We consider a time slotted NFV system, where predictive scheduling is viable, i.e., the request in time can be perfectly predicted, pre-generated, and pre-served by the system.111 An example of predictive scheduling in practical systems is that Netflix predicts its users’ behaviors and preloads video onto their devices[7]. Figure 1(a) shows the basic settings and initial system state in time slot . All VNF instances are readily deployed on servers with a fixed placement. Each instance maintains a queue to buffer any untreated request. Every server has a service capacity of two requests per time slot; processing a request incurs an energy cost of . Note that 1) any requests processed by VNF ’s instance are not counted in the queues, but readily to be sent to VNF ’s instances in the next time slot; 2) requests that have been processed by VNF ’s instances are considered finished.

In this case, there are two possible service chaining decisions, i.e., forwarding the processed request from the instance of VNF to either VNF ’s instance on server II (Decision #1) or server III (Decision #2). It takes a communication cost of to forward the request to VNF ’s instance on server II. The communication cost is to the other instance of VNF on server III.

Our goal is to choose a service chaining decision in time that jointly minimizes the total energy cost, total communication cost, and the total residual backlog size at the end of time . 222By applying Little’s law, a short queue length implies short queueing delay or short response time. Figures 1(b) - 1(d) compare the scheduling processes under different service chaining decisions.

In Figure 1(b), the new request in time is admitted, while the processed request is forwarded to the instance of VNF on server II. Although incurring a low communication cost of , such a decision also leads to imbalanced queue loads among VNF ’s instances. Note that every server can serve at most two requests per time slot. Hence, servers will then process four requests in total, including the new request on server I, two requests on server II, and another one on server III. The processing incurs a total energy cost of . After processing, VNF ’s instance on server II still has one untreated request in its backlog. Thus Decision #1 incurs a total cost of on energy and communication, with a residual backlog size of .

On the other hand, when the processed request is forwarded to the instance of VNF on server III, the decision incurs a high communication cost of but results in balanced queue loads among VNF ’s instances. Servers will process five requests in total, including the new request on server I, and the rest from server II and III. The processing incurs a total energy cost of . After processing, there are no untreated request left in the backlogs. Decision #2 incurs a higher total cost of on energy and communication, but with no residual backlogs.

Insight 1: Figures 1(b) and 1(c) show that we cannot achieve the optimal values for different system metrics simultaneously, i.e., there is a potential trade-off between optimizing the total system cost and reducing the total queue length.

Additionally, we find that server I is under-utilized in both Figures 1(b) and 1(c), because VNF ’s instance only receives and handles the new request at time . In fact, Figure 1(d) shows that we can exploit the spare processing power on server I by pre-admitting and pre-serving the future request. Consequently, we can shorten the response time for the future request by incurring one more energy cost in time . Note that pre-service does not introduce extra energy cost but actually pays it beforehand. The reason is that even without pre-service, we still have to pay one energy cost in the subsequent time slots after the future request arrives.

Insight 2: By utilizing servers’ spare processing power and paying system cost in advance, predictive scheduling can effectively shorten response times of future requests.

To characterize the non-trivial trade-off and exploit the power of predictive scheduling in NFV systems, we present our formulation in the next section.

Fig. 2: An instance of our system model. There are two network services (NS 1 and NS 2) with their VNF instances deployed on servers I, II, and III. At the beginning of time slot

, the traffic classifier admits and push requests to the queues

and with respect to their requested services. For each instance, based on its server’s resource scheduling, it serves requests from its processing queue and forwards requests to its next VNF instances, e.g., instance of VNF on server I to instance of VNF on server III.

Iii Problem Formulation

We consider a time slotted NFV system, where virtualized network functions (VNF) are instantiated, deployed over a substrate network, and chained together to deliver numbers of network services. Upon the arrival of new network service requests, each VNF processes and hands over requests to its following VNF in a pipeline fashion. All requests are assumed homogeneous; i.e., each request is assumed to have equal size and require the same amounts of computation to be processed. We show an instance of our system model in Figure 2 and summarize main notations in Table I. More details of service chaining can be found in IETF RFC-7665 [16].

Substrate Network Model
The set of servers that host VNFs
Communication cost of sending one request
from server to server
The capacity of type- resource type on
server
The unit cost of type- resource on server
Network Service Model
Number of network services
Chain length of network service
The set of virtual network functions (VNFs)
The set of all ingress VNFs
The set of all non-terminal VNFs
The -th VNF in network service chain
The network service that contains VNF
Previous VNF of in service chain
Next VNF of in service chain
Number of requests processed by unit of
type- resource
Deployment Model
The set of all servers that host ’s instances
The set of VNFs with instances residing on
server
System Dynamics
Number of new requests for network
service arriving in time
Number of untreated requests for network
service in the next -th slot from time
The prediction queue length for
network service in time
Queue length of VNF ’s instance on
server in time
Number of processed requests by VNF ’s
instance on server in time , and to
be sent to next VNF in time
Total number of admitted requests for network
service
Number of admitted requests from
Scheduling Decisions
Number of admitted requests for service
onto server
if the instance of on
server is selected to receive the processed
requests from ’s instance on server and
otherwise

the vector of allocated resources on server

VNF ’s instance
System Objectives
Total communication cost in time
Total computation cost in time
Weighted total queue length in time
TABLE I: Main Notations

Iii-a Substrate Network Model

We consider the substrate network with a set of heterogeneous servers. On each server , we consider types of resources, e.g., GPU [59], CPU cores [26], and cache [47]. The -th resource type has a capacity of and a unit cost of . We denote the resource capacity vector by , and the resource unit cost vector by .

For every server pair , we use to denote the communication cost of transferring a request between the servers in time , e.g., the number of hops or round-trip times. If two servers are not reachable from each other in time , then we set . The set of all communication cost in time slot is denoted by .

Iii-B Network Service Model

There are network services and a set of VNFs. Each network service is represented by a chain of ordered VNFs, wherein the -th VNF is denoted by . To avoid triviality, we assume that for every network service . Note that is a constant and usually not very large[17]. We regard the same VNF that appears in different service chains as distinct VNFs. In practice, one can set up multiple queues on one VNF instance to buffer requests for different services and map each queue to one VNF instance in our model.

Next, we use to denote , i.e., the set of ingress VNFs of all network services, and to denote the set of non-terminal VNFs of all network services. For every VNF , we denote its network service by . If , i.e., not the first VNF of its network service, then we denote its previous VNF by ; likewise, if , i.e., not a terminal VNF, then we denote its next VNF by .

Iii-C Deployment Model

In practice, due to request workload changes, it’s common to provide multiple instances for every VNF, encapsulate the instances into containers, and distribute them on servers for better load balancing and fault tolerance [51]. We assume that each VNF has at most one instance on each server but it can have multiple instances on different servers. The placement of VNF instances is assumed to be pre-determined by adopting VNF placement schemes similar to existing ones[58, 61, 10, 41]. Depending on the placement, the instances required by each service are not necessarily readily available on the same server. Note that our model can be further extended to cases with each VNF having multiple instances on the same server.

For VNF , we use to denote the set of servers that host ’s instances. Correspondingly, each server hosts a subset of VNFs. Every instance maintains one queue to buffer its relevant requests. For example, if VNF has one instance on server , then the instance has a queue of size at the beginning of time slot . Instead of individual queues, one can also implement a shared public queue among instances of the same VNF. All requests from preceding VNF’s instances are firstly forwarded and buffered in the public queue. These buffered requests are then rescheduled to one or more idle or least loaded instances. Such a way brings more flexibility so that requests can avoid the potential long queueing delay on individual instances. However, it requires additional physical storage and communication cost due to additional rescheduling. The choice depends on the trade-off made by system designers. Here we adopt the queueing model for each individual instance.

Iii-D Predictive Request Arrival Model

For network service , we use ( for some constant ) to denote the number of its new requests that arrive in time slot , and independent over time slots. In practice, considering the statefulness of VNFs, requests may be aggregated and scheduled in the unit of flow. Our model captures the system dynamics at a finer granularity than the flow-level abstraction, and can be further extended to the case with correlation between requests.

Next, we consider a system which can predict and pre-serve future request arrivals for network services in a finite number of time slots ahead. Though the technique and analysis of prediction is still under active development[62, 39, 36], we do not assume any particular prediction technique in this paper. Instead, we assume the prediction as the output from other standalone predictive modules, and investigate the fundamental benefits by acquiring and leveraging such future information and the risks induced by mis-prediction. Note that such an assumption is valid to approximate practical scenarios where short-term prediction is viable. For example, Netflix promotes its quality-of-experience (QoE) by predicting user demand and network conditions, then prefetching video frames onto user devices[7].

We assume that for network service , the system has perfect access to its future requests in a prediction window of size ( for some constant ), denoted by . In practice, however, such prediction may be error-prone; we shall evaluate the impact of mis-prediction in the simulation. With pre-service, some future requests may have been admitted into or even pre-served before time , thus we use () to denote the number of untreated requests in slot at time , such that

(1)

Note that denotes the number of untreated requests that arrive at time . Therefore, the total number of untreated requests for service is . Here we can treat as a virtual prediction queue that buffers untreated future requests for network service . In practice, the prediction queues can be hosted on servers or storage systems in proximity to the request traffic classifier [44]. To simplify notations, we use to denote the vector of all queues’ length and .

Iii-E System Workflow and Scheduling Decisions

System Workflow: At the beginning of each time slot , system components (including traffic classifier, VNF instances, and servers) collect relevant system dynamics to decide request admission, service chaining, and resource allocation. According to the decisions, the traffic classifier admits new requests for different network services. VNF instances steer the requests which are processed in time slot to their next VNF’s instances. Meanwhile, every server allocates the resources to its resident VNF instances[26]. The instances then process the requests from their respective queues. At the end of time slot, the prediction window moves one slot ahead.

In the above process, we need to consider three kinds of scheduling decisions.

i) Admission Decision: For every network service, the traffic classifier decides the number of untreated newly arriving and future requests, to admit into the system. Particularly, for a network service and its respective ingress VNF , the classifier decides , i.e., the number of admitted requests to ’s instance on server . We use to denote the total number of admitted requests from prediction queue . These admitted requests should include at least all the untreated requests that actually arrive, while not exceeding , i.e., in time slot and for ,

(2)

Note that requests are admitted in a fully-efficient manner [21]. In other words, by admitting untreated requests from for , the allocation should ensure a total number of requests to be admitted, i.e.,

(3)

The untreated request backlog evolves as follows,

(4)

while , where we define . We denote all admission decisions by .

ii) Service Chaining Decision: Given a non-terminal VNF , we denote as the service chaining decision at time . We consider the case when VNF and its next VNF have instances on server and , respectively. The decision with value indicates the processed requests from VNF ’s instance on server will be sent to ’s instance on server , and otherwise. To ensure that every instance has a target instance to send its requests, we have

(5)

On the other hand, if VNF (or its next VNF) has no instances on server (or ), then in each time slot . Note that dynamic request steering can be implemented by adopting VNFs-enabled SDN switches [20]. We denote all chaining decisions by .

iii) Resource Scheduling Decision: For each server and VNF , we define as the allocated resource vector to ’s instance. To ensure any allocation with at least one CPU core and other resources, or without any resources at all, we restrict the choice of to a finite set of options . Note that for all , i.e., the option of no resource allocation is always available. Besides, the total allocated resources should not exceed server ’s resource capacity, i.e.,

(6)

Note that for all the time if . Given resource allocation , the instance can process and forward at most requests, where

is assumed to be estimated from system logs. Due to time slot length limit, a VNF instance can’t process too many requests and thus we assume

for some constant . We denote all allocation decisions by .

Iii-F System Workflow and Queueing Dynamics

Fig. 3: An instance of queueing model with a lookahead window size of two.

In time slot , the system workflow proceeds as follows. At the beginning of time slot , system components (including traffic classifier, VNF instances, and servers) collect all available system dynamics to make request admission, service chaining, and resource allocation decisions . According to the decisions, traffic classifier admits new requests for different network services. VNF instances steer the requests which are processed in time slot to their next VNF’s instances. Meanwhile, every server allocates the resources to its resident VNF instances. The instances then process the requests from their respective queues. At the end of time slot , the prediction window for each network service moves one slot ahead. Thus given , prediction queue is updated as follows

(7)

With the above workflow, we have the subsequent queueing dynamics for different VNF instances.

Instances of Ingress VNFs: For every network service and its respective ingress VNF , there are admitted requests to ’s instance on server . Accordingly, the update function for queue length is

(8)

Instances of Non-Ingress VNFs: For the instance of VNF on server , if , then the instance will receive processed requests from the instance of VNF on server ; otherwise, the instance will receive no new requests. then the queueing update function is given by

(9)

where i.e., the allocated service rate for the instance of on server in time . The inequality is due to that the actual number of untreated requests may be less than the service rate in time . All requests processed by the last instances of service chains are considered finished. The vector is denoted by . Figure 3 shows an example of our queue model, in which there are two network services that require six types of VNF whose instances are hosted on three servers; each of the network services has a prediction window of size two. In Figure 3, we show how requests are admitted and transferred between successive queues for the first network service (NS ) in time , given admission decision , , and chaining decision .

Iii-G Optimization Objectives

Communication Cost: Recall that transferring a request over link incurs a communication cost , e.g., the number of hops or round-trip times. Low communication cost are highly desirable for responsiveness of requests. In time slot , given the service chaining decisions, the communication cost between server and is

(10)

where denotes the communication cost of transferring a request between servers and in time . Then the total communication cost in time is given by

(11)

Energy Cost: Efficient resource utilization for servers is another important objective to achieve in NFV systems [53]. Given the resource allocation , we define the corresponding energy cost in time as , where is a constant vector, with each entry as the unit cost of -th type of server resources. The total energy cost in time is

(12)

Queue Stability: Considering the responsiveness of requests and scarcity of computational resources such as memory and cache, it is also imperative to ensure that no queues would be overloaded. We denote the weighted total queue length in time as

(13)

where is a constant that weights the importance of stabilizing instances queues compared to prediction queues. Accordingly, we define the queue stability [37] as

(14)

Iii-H Problem Formulation

Based on the above models, we formulate the following stochastic network optimization problem (P1) that aims at the joint minimization of time-average expectations of weighted communication cost and energy cost while ensuring queue stability. With such formulation, we explore the potential trade-off among different system metrics.

(15)

where is a constant that weights the relative importance of energy efficiency to reducing communication cost.

Iv Algorithm Design and Performance Analysis

We present POSCARS, an online and predictive algorithm that solves problem P1 through a series of online decisions, followed by its performance analysis and three variants.

Iv-a Algorithm Design

Problem P1 is challenging to solve due to time-varying system dynamics, the online nature of request arrivals, and complex interaction between successive VNF instances. Therefore, instead of solving problem P1 directly, we adopt Lyapunov optimization techniques [37] to transform the long-term stochastic optimization problem into a series of sub-problems over time slots, as specified by the following lemma.

Lemma 1

By applying Lyapunov optimization techniques and the concept of opportunistically minimizing an expectation, problem P1 can be transformed to the following optimization problem to be solved in each time slot :

(16)
(17)

where is defined as

(18)

such that is a positive parameter that weights the importance of minimizing system cost compared to stabilizing system queues, and is defined as

(19)

The detailed proof of Lemma 1 is relegated to Appendix-A. Here we provide a sketch of how the problem transformation is carried out. Note that the key technique we adopt is the drift-plus-penalty method [37], which generally aims to stabilize a queueing network while also optimizing the time-average of some objective (e.g., the total cost of energy consumption and communication in P1). To this end, a quadratic function (a.k.a. Lyapunov function) is first introduced to characterize the stability of all queues in each time slot. Then the key idea of the method is to introduce a drift-plus-penalty term to characterize the joint change in the queue stability and the objective value across time slots. In particular, the drift-plus- penalty term is defined as the weighted sum of two parts. One is defined as the difference (a.k.a. drift) between the Lyapunov functions of two consecutive time slots, which measures the short-term change in queue stability. The other part is defined as the instant objective value in a time slot. Then the stability of the queueing network and the optimization of the time- average of the objective are jointly achieved by deriving an online control policy that greedily minimizes the upper bound of the drift-plus-penalty term during each time slot. In this way, it can be proven that it is equivalent to solve problem P1 by resolving a series of subproblems (P2) over time slots.

Note that by solving problem P2 over time slots, problem P1 can be solved asymptotically optimally as the total number of time slots and the value of parameter both approach infinity, as shown by Theorem 1 in Sec. IV-B. Furthermore, problem P2 can be decomposed into three sub-problems for request admission, service chaining, and resource allocation, with their decisions in each time slot denoted by , , and , respectively. Then we propose POSCARS, a predictive online service chaining and predictive resource scheduling scheme, and show its pseudocode in Algorithm 1.

Remark 1

Regarding request admission, when all instances are more loaded than the prediction queue. in order not to overload any instances, POSCARS admits only untreated requests at current time slot and spreads them evenly onto least loaded instances. However, when instances all have shorter queue lengths than the prediction queue, POSCARS admits all future requests and assigns them to the least loaded instances.

Remark 2

POSCARS decides the service chaining by jointly considering instances’ queue length and the communication cost. Recall the definition in (18), where the weighted summation actually reflects the unit price of sending a request from VNF ’s instance on server to the instance of its next VNF on server . If the target instance is heavily loaded, there will be a high price of forwarding the request to that instance. Besides, a large communication cost also makes it less willing to choose the target instance.

1:  Initially in time slot , given backlog sizes , service rates , energy cost , and communication cost . Output: chaining and scheduling decisions.
2:  for every network service
3:   %% Request admission for ingress VNF
4:   The traffic classifier first finds the set of servers that host the least loaded instances of VNF .
5:   if for all then
6:   Admit the untreated requests at current time.
7:   else
8:   Admit all untreated requests.
9:   endif
10:   Spread admitted request evenly to least loaded instances.
11:  endfor
12:  %% Service chaining
13:  for every non-terminal VNF :
14:  for the instance of on server :
15:   Forward its processed requests to one of the servers from with minimum .
16:   endfor
17:  endfor
18:  %% Resource scheduling
19:  for every server :
20:   Initialize an empty lookup table and set .
21:   Set and
22:   while :
23:   Choose the minimum among all keys of .
24:   Select its associated and .
25:   Remove entry with key from .
26:   if and :
27:   Allocate resource to according to .
28:   .
29:   Remove all entries related to .
30:   endif
31:   endwhile
32:  endfor
Algorithm 1 POSCARS (Predictive Online Service Chaining And Resource Scheduling) in one time slot
Remark 3

On server , the resource allocation is decided by jointly considering the resource cost and the queue length of its resident instances. Particularly, we regard the term as the unit net cost vector of resources allocated to the instance of VNF . Regarding the unit net cost of type- resource, i.e., , it is the weighted difference between the unit cost of type- resource and the queue length of the instance. A high unit resource cost will result in a prudent allocation. On the other hand, a sufficiently long queue length will make the allocation more worthwhile. In both cases, POSCARS selects the set of resource allocation decisions that satisfy constraint (6) and minimize the total net cost.

Iv-B Performance Analysis

We analyze the computational complexity of POSCARS in each time slot as follows. For each network service, it takes time to make request admission decisions (lines -). Next, each non-terminal VNF instance selects and forwards requests to its successors in time (line ). Every server takes time to initialize the lookup table (lines -) and time to decide the resource allocation, where is the maximum number of applicable resource allocation for any VNF instance. In practice, POSCARS can be run in a distributed manner. Particularly, the request admission sub-routine can be implemented on each traffic classifier with a computational complexity of ; meanwhile, the service chaining and resource scheduling sub-routines can be deployed on the hypervisor of each server, with computational complexities of for each instance and for each server, respectively, where is the maximum number of applicable resource allocations.

On the other hand, without predictive scheduling, we show that POSCARS achieves an trade-off between the time-averages of total queue length and total cost via the tunable parameter . In particular, given the value of , let denote the optimal value of problem P1; then we have the following theorem.

Theorem 1

Suppose that and, given the system resource capacities on each server and VNF placement, there exists an online scheme which ensures that, for each VNF instance, the mean arrival rate is smaller than its mean service rate. Under POSCARS without prediction, there exist constants and such that

The proof is relegated to Appendix-B. Theorem 1 demonstrates an trade-off between system cost optimization and queue stability. Particularly, without prediction, POSCARS can achieve a near-optimal cost within an optimality gap but at the cost of an increase in the time-averaged total queue length. Intuitively, with a large value for , VNF instances are more willing to steer requests to their successive instances in nearby servers, while server would allocate resources to instance with less energy cost. As a result, the total cost can be effectively reduced; however, some servers may become hot spots and the total queue length will increase. In contrast, a smaller value of conduces to more balanced queue loads among servers and more energy cost consumed to serve requests, leading to an increasing total cost. Moreover, given predicted information about future requests, POSCARS can achieve a better trade-off with a notable delay reduction by pre-serving requests with surplus system resources. We verify such advantages by our simulation results in Section V.

Iv-C Practical Issues and Variants of POSCARS

The distributed nature of POSCARS requires each VNF instance to gather relevant system dynamics on its own. However, the probing process may incur considerable sampling overheads and additional latencies. Meanwhile, each instance makes its independent decision based on the sampled information at the beginning of a time slot. Therefore, instances may blindly choose the same lowest-cost instance, without knowing others’ choices. The chosen instance will then become overloaded due to the non-coordinated decisions. An alternative is to perform sampling before sending each request. Nonetheless, this method suffers from the messaging overheads of frequent samplings. A possible compromise is to split the processed requests into batches, then sample and schedule for each batch separately.

To mitigate such issues, we propose the following variants of POSCARS, by adopting the ideas from recent randomized load balancing techniques, such as The-Power-of--Choices [34], Batch-Sampling [38], and Batch-Filling [54].

POSCARS with The-Power-of--Choices (P-Po): To reduce sampling overheads, we apply the idea of The-Power-of--Choices to POSCARS. Particularly, every non-terminal instance probes only the instances uniformly randomly from its next VNF. Next, the instance chooses to send all its processed requests to the lowest-cost instance among the samples. In such a way, each instance requires only few times of sampling to decide its target instance. Although the selected instance may not be the least-cost one, our later simulation results show that the reduced sampling brings only a mild increase in the total cost.

The above variant significantly reduces the sampling overheads. However, the issue of non-coordinated decision making still remains. To mitigate such issues, we adopt the idea of batch-sampling[38] and batch-filling [54] and propose another two variants of POSCARS, namely POSCARS with Batch-Sampling (P-BS) and POSCARS with Batch-Filling (P-BF), respectively. Basically, these two variants split the processed requests on each instance into batches, each batch with a size of , then carry out scheduling upon such request batches. When , we actually perform scheduling for each request separately. When is greater than the number of processed requests, then scheduling is only performed once in a time slot, degenerating to POSCARS. We elaborate the design of P-BS and P-BF as follows.

POSCARS with Batch-Sampling (P-BS): Given an instance with batches of requests, it probes instances uniformly randomly from its next VNF, where is the respective probe ratio. Then the instance sends the request batch to the least-cost instances, with each batch to a distinct target instance.

POSCARS with Batch-Filling (P-BF): Given an instance with request batches, it probes instances uniformly randomly from its next VNF. Then it forwards the request batches one by one. Each batch is sent to the least-cost instance among the samples. The chosen instance’s cost is updated after it receives the batch of requests.

V Simulation

We conduct trace-driven simulations to evaluate the performance of POSCARS and its variants. The request arrival measurements are drawn from real-world systems[4], with a mean arrival rate of per time slot (ms) and mean inter-arrival time of ms. Besides, we conduct simulations with the Poisson request arrivals with the same rate of . All the results are obtained by averaging measurements collected from repeated and independent simulations.

V-a Simulation Settings

Substrate Network Topology: We construct the substrate network based on two widely adopted topologies, i.e., Jellyfish[45] and Fat-Tree[3]. Both topologies have a comparable scale to clusters in data center networks, each equipped with switches, servers with deployed VNFs, and the rest servers as hosts that generate service requests. Particularly, in Fat-Tree, there are pods, each pod containing to servers; amongst them, we choose one server uniformly at random as the one with deployed VNFs and the rest as hosts. Requests can be processed on servers in any pod with the VNF they demand. Between any two servers, request traffic traverses over the shortest path with link capacity of Gbps. For each pair of servers, the communication cost per request is proportional to the number of hops of the shortest path between them, with variation.

Server Resources: We consider CPU cores as the resources on each server, since CPUs have become the major bottleneck for request processing in NFV systems [32, 1, 8]. Servers are heterogeneous, each with a number of CPU cores ranging from to . In every time slot, we calculate the power consumption in the unit of utilized CPU cores, with . Regarding parameter , setting it with a greater value would encourage each server to assign most resources to heavily loaded VNF instances. Conversely, a smaller value of would lead to more balanced resource allocation among such instances; consequently, this will minimize the impact of imbalanced queue loads on the decision making for service chaining. The value setting depends on the objectives to fulfill in real systems. In our simulation, by fixing , we assume that communication cost reduction and system energy efficiency are equally important.

Service Function Chains: We deploy five network services, each with a service chain length varying from to . Each service contains at least one of the most commonly-deployed VNFs; e.g., Intrusion Detection System (IDS), Firewall (FW), Load Balancer (LB). The rest VNFs of each service are chosen uniformly from other commonly-used VNFs [29] at random without replacement. For each VNF, the total number of instances ranges from to .

Prediction Settings: Network services’ traffic often varies in predictability. We denote the average window size by , and set each service window size by sampling uniformly from at random. We evaluate the cases with perfect and imperfect prediction. For perfect prediction, future request arrivals in the time window are assumed perfectly known to the system and can be pre-served. In practice, such an assumption is not feasible for stateful requests; nonetheless, that can be seen as the extended case of our results with more constraints on request processing. For imperfect prediction, the failure of prediction generally falls into two categories. One is false-negative detection, i.e., a request is not predicted to arrive, and as a result, it receives no pre-service before its arrival. The other is false-positive detection, i.e., a request that does not exist is predicted to arrive. In this case, the system pre-allocates resources to pre-serve such requests. We consider two extreme cases: one is that we fail to predict the arrivals of all future requests; the other is that we correctly predict the actual future arrivals, and furthermore, some extra arrivals are falsely alarmed. Note that any form of mis-prediction can be seen as a superposition of such two extremes. In addition, we also implement five schemes that forecast request arrivals in the next time slot (with window size

), including: 1) Kalman filter (Kalman)

[9]; 2) distribution estimator (Distr), which generates the next estimate by independent sampling from the distribution of arrivals learned from historical data; 3) Prophet (FB) [46], Facebook’s time-series forecasting procedure; 4) moving average (MA) and 5) exponentially weighted moving average (EWMA)[6].

Baseline Schemes: We compare POSCARS with three baseline schemes, including Random, JSQ (Join-the-Shortest-Queue), and state-of-the-art OneHop-SCH (OneHop scheduling)[49]. These schemes differ in the service chaining strategy from POSCARS. In Random scheme, each instance uniformly randomly sends requests to one of its successors. In JSQ scheme, each instance sends requests to its least-loaded successor. In OneHop-SCH, each instances sends requests to its successor with the least communication cost and idle capacity.

Variants of POSCARS: To compare the performance of POSCARS and its variants, we evaluate them under different settings. For each of the variants, we vary their probe ratio ( for P-Po, for P-BS, and for P-BF) from to , and fix the batch size for P-BS and P-BF as requests per batch. We omit the cases when the ratio is and greater than . Notice that the former corresponds to the random scheme and actually leverages no load information; the latter leads to excessively fined-grained control since it induces too much sampling overheads.

Request Response Time Metric: To evaluate the impact of predictive scheduling, we define a request’s response time as the number of time slots from its actual arrival to its eventual completion. If a request is pre-served before it arrives, then the system is assumed to respond to the request upon its arrival, and the request will experience a zero response time.

V-B Performance Evaluation under Perfect Prediction

Intuitively, POSCARS is promising to shorten the requests’ response time by exploiting predicted information and pre-allocating idle system resources to pre-serve future requests. Therefore, the essential benefits of predictive scheduling come from the load balancing in the temporal dimension. To verify such intuition, we first consider the case with perfectly predicted arrivals, and evaluate POSCARS with () and without () prediction, against the baseline schemes.

Fig. 4: Average response time () with various window sizes given trace and Poisson arrival process, under different topologies.
(a) Total cost with parameter
(b) Queue size with parameter
Fig. 5: Total queue length under different window sizes.

Average response time vs. window size : Figure 4 shows the performance of the different schemes under Jellyfish and Fat-Tree topology. The response times induced by the baseline schemes remain constant since they do not involve predictive scheduling. Random incurs the highest response time (ms), since it disregards information about workloads or communication cost when dispatching requests. JSQ does much better (ms) because requests are always greedily forwarded to the least-loaded successors. OneHop-SCH outperforms the previous two by jointly taking the workloads and communication cost into consideration. Meanwhile, without prediction (), POSCARS achieves comparable performance with OneHop-SCH; but as increases from to , we observe a significant reduction in the average response time under both topologies; e.g., from ms to ms under Fat-Tree topology. The marginal reduction diminishes as further increases, and eventually, remains at around ms.

Insight: In practice, due to traffic variability, it is often not realistic to achieve high predictability (large ). However, the results show that, only mild-value of future information suffices to POSCARS’s shortening requests response time effectively and achieving load-balancing in the temporal dimension. With more future information, the reduction diminishes since the idle system resources have already been depleted.

Considering the qualitative similarities among curves with different settings, we only present results under Fat-Tree and trace-driven request loads.

Backlog-cost trade-off with parameter : Recall from Section III.B that the value of parameter controls the backlog-cost trade-off. Figures 5(a) and 5(b) verify such a trade-off. Figure 5(a) compares the time-average communication cost of POSCARS with , , , against baselines. Both Random and JSQ induce a high total cost since their decision making disregards the resultant communication cost and the heterogeneity of servers in terms of energy cost. OneHop-SCH further lowers the total cost by about , by taking its advantages of jointly optimizing cost and shortening queue lengths based on flow-level statistics. Given different choices of , POSCARS achieves close-to-optimal time-average total cost as the value of rises up to . Notably, POSCARS excels OneHop-SCH whenever .

However, recall that parameter weighs the importance of minimizing system cost compared to maintaining queue stability. Hence, to reduce system cost, large values of also lead to increased backlogs. By Little’s theorem[30], this would increase response time as well. In Figure 5(b), we see that the total queue length is almost proportional to value of , exceeding all other baselines as .

Insight: POSCARS achieves a backlog-cost trade-off with different values of parameter . By choosing an appropriate value of from , it outperforms the baseline schemes with both lower system cost and shorter queue lengths. In practice, such an interval may vary from system to system but it is usually proportional to the ratio of magnitudes of the total queue length to total system cost.

POSCARS and its variants: Upon forwarding requests, POSCARS requires each instance to collect statistics from all its successors. In practice, this may require non-negligible sampling overheads in face of a large number of instances. In Section III.C, we propose three variants of POSCARS, i.e., P-Po, P-BS, and P-BF. These variants trade off optimality of decision making for reduction in sampling overheads and complexity[54] from to , where denotes the total number of candidate instances. Figure 6 evaluates the total cost and average response time induced by POSCARS and its variants, with parameter , , batch size of for P-BS and P-BF, and the probe ratio .

In Figure 6(a), we see that POSCARS achieves the lowest total cost, since each instance’s decision making is based on the full dynamics of its succeeding instances. For each variant, we see a cut-down in the total cost by up to as increases from to . Similarly, from Figure 6(b), we also observe a reduction in response time from about ms by up to . Among the three variants, P-BS and P-BF induce more reduction in both cost and response time than P-Po, because aggregated sampling is often more conducive to lowering the cost [38].

Insight: By sampling partial system dynamics for decision making, variants of POSCARS trade off optimality for reduction in sampling overheads and complexity. Owing to aggregated sampling, P-BF and P-BS outperforms P-Po in terms of both lower total cost and response time.

(a) Time-average total cost
(b) Average response time
Fig. 6: Comparison among POSCARS and its variants

V-C Performance Evaluation under Imperfect Prediction

In practice, prediction errors are inevitable due to dataset bias and noise. To explore the fundamental limits of predictive scheduling, we evaluate the impact of imperfect prediction on the system performance.

Total cost and response time vs. : Figure 7 compares the time-average total cost and average response time induced by different forecasting schemes and perfect scheduling using POSCARS. In Figure 7(a), we observe that all forecasting schemes incur higher time-average total cost than predictive scheduling by up to . The reason is as follows. Recall that the prediction under these forecasting schemes are imperfect, with both false-negative and false-positive detection. Particularly, the system pre-allocates extra resources to pre-serve false-positive requests, resulting in higher total cost. Figure 7(b), shows the overall ascending trend proportional to increased . This is due to that larger values of lead to a greater total queue length, and by Little’s theorem[30], a greater queue length implies longer response time. However, we also see that, even under imperfect prediction, predictive scheduling does not necessarily lead to longer response time than that under perfect prediction.

To figure out the reason, we consider two extreme cases. One is all-false-negative, i.e., during each time slot, all future request arrivals in the lookahead window are false-negative. Notice that this case is equivalent to the case without predictive scheduling (), since no requests will be pre-allocated resources. The other is all-false-positive, i.e., all future request arrivals are perfectly predicted, and besides, some extra requests are wrongly predicted to arrive.

Perfect prediction with two extremes: Figure 8(a) compares average response times under perfect prediction and the two extremes, with , , and false-positive requests on average. Overall, the average response time is proportional to the value of . Miss detection incurs higher response time than the other two, because it does not pre-serve any requests before they arrive. On the other hand, perfect prediction and false alarm do not necessarily outperform the other with lower response times. This is because of two consequences of false alarm. The first is that false-positive requests will consume extra system resources and prolong the request queues length, thus leading to longer response times. The second is that, according to lines - in Algorithm 1, false-positive requests result in a greater prediction queue length. That forces POSCARS to admit future requests more frequently, thus conducing to shorter response times. The same effect can be achieved by tuning the parameter – greater values of lead to less frequent admission.

How do these two consequences interplay? The question is answered by Figure 8(b), where the number of average false-positive requests varies from to , with , , and . When the average number of false-positive requests increases from to , the resultant response time falls even lower than that under perfect prediction. In such cases, the second consequence dominates – mild false alarm leads to more frequent admission, making POSCARS spread requests more evenly among instances. However, as false alarm continues aggravating, the reduction diminishes and the response time grows constantly. In such cases, though the admission frequency is intensified, too much false alarm severely extends the total queue length, offsetting and eventually outweighing the effect of load balancing.

(a) Time-average total cost
(b) Average response time
Fig. 7: Performance of prediction schemes with and .
(a) Different values of
(b) Different false-alarms
Fig. 8: Average response time under three different prediction cases

Insight: Imperfect prediction does not necessarily degrade system performance, such as longer response times. Instead, mild false alarm allows the system to make better use of idle system resources, further shortening response time.

Vi Related Work

In this section, we first summarize existing works that study the optimization of NFV from different aspects. Then we narrow down our focus onto those that are most relevant to this paper and compare their proposed approaches with ours.

Vi-a Optimizing NFV/VNF from Different Aspects

A wide range of recent works have studied NFV systems from various aspects. Below we take a brief overview and discuss how they are related to our work.

  • VNF placement: In NFV, the placement of VNF instances often has a significant impact on system performances [27] and thus deserves an elaborate design. A number of existing works have been conducted to this end (e.g., [40][13][11][2][60]). In practice, such approaches can serve to decide the VNF placement, upon which our schemes can carry out their scheduling procedures accordingly.

  • VNF Resource Allocation: Another series of works (e.g., [26][28][64]) focused on the optimization of resource allocation for VNF/NFV, with the aim to minimize VNF execution overheads and accelerate the processing speed of VNF instances. They concentrated on achieving such improvements with particular hardware designs. Different from such works, we mainly focus on exploiting predicted information to perform effective scheduling on existing NFV systems. Nonetheless, our schemes can be applied to systems built with their solutions.

  • Load Balancing: Existing works (e.g., [48][50][56]) also developed various schemes to balance the workloads among chained VNF instances to improve resource utilization and fault tolerance while shortening delays in NFV systems. In practice, existing solutions can serve as reference points for system designers to tune the proper value of parameter for desired performance metrics.

  • Performance Characterization: Another line of works have devoted their efforts to characterizing various dynamics of NFV systems such as performance interferences among VNF instances[57][42]. Insights from such works can be combined with our schemes to achieve even better performance.

Vi-B Chaining and Resource Scheduling of VNFs in NFV:

Regarding the optimization of VNF service chaining and resource scheduling in NFV, existing works generally fall into two categories.

Of the first category are the schemes that perform service chaining and resource scheduling in an offline fashion. Typically, they assume the full availability of information about all service requests or flows. Based on flow abstraction, Zhang et al. [58] consider the joint optimization for VNF placement and service chaining. They formulate the problem as an ILP problem and develop an efficient rounding-based approximation algorithm with performance guarantee. Yoon et al. [55]

adopt the BCMP queueing model for VNF service chains and propose heuristics to approximately minimize the expected waiting time of service chains. Wang

et al. [49] consider the joint optimization of service chaining and resource allocation and develop a greedy scheme that aims to place instances and schedule traffic with minimum link cost, CAPEX, and OPEX. Later, D’Oro et al. [12] study service chaining problem from the perspective of congestion games. By formulating the problem as an atomic weighted congestion game, they propose a distributed algorithm that provably converges to the Nash equilibrium. On the other hand, Zhang et al. [61] formulate a request-level optimization problem based on steady-state metrics and propose a heuristic scheme by applying techniques from open Jackson queueing network. However, there is no empirical evidence to show that service request arrivals follow Poisson process in NFV systems. Different from existing works, our model and problem formulation assume no prior knowledge about underlying request traffic. Moreover, instead of offline or even centralized decision making, our solution is capable to perform near-optimal service chaining and scheduling in a computationally efficient and decentralized manner.

Of the second category are the online schemes that process requests upon their arrivals. Under this setting, Mohammadkhan et al. [35] formulate the VNF placement for service chaining as a MILP problem based on flow abstraction and develop a heuristic to solve the problem incrementally. Lukovszki et al. [31] develop an online algorithm that performs request admission and service chaining with a logarithmic competitive ratio. Zhang et al. in [63] propose a novel VNF brokerage service model and online algorithms to predict traffic demands, purchase VMs and deploy VNFs. Further, Fei et al. [14] develop an effective algorithm that performs online VNF scheduling and flow routing with predicted flow demand, so as to minimize the impact of inaccurate prediction and the cost of over-provisioned resources. Later, Xiao et al. [52]

propose an adaptive service chaining deployment scheme based on deep reinforcement learning techniques, which conducts service chaining to serve incoming requests in an online fashion. Such schemes either resort to flow-level system dynamics and predicted information for decision making, or perform finer-grained control at the request level to optimize dedicated objectives. Our model considers such trade-offs and separates the granularity of system state and decision making. Besides, we also explore the fundamental benefits and limits of predictive scheduling, which still remains open in NFV systems.

Vii Conclusion

In this paper, we studied the problem of dynamic service chaining and resource scheduling and systematically investigated the benefits of predictive scheduling in NFV systems. We developed a novel queue model that accurately characterizes the system dynamics. Then we formulated a stochastic network optimization problem and then proposed POSCARS, an efficient and decentralized algorithm that performs service chaining and scheduling through a series of online and predictive decisions. Theoretical analysis and trace-driven simulations showed the effectiveness and robustness of POSCARS and its variants in achieving near-optimal system cost while effectively shortening average response time. Our results also show that prediction with mild false-positive conduces to shorter response times. In addition, note that fair-share of resources and performance isolation among VNF instances are the key to maintaining high quality of service. Therefore, it is an interesting direction for future work to establish a more effective joint service chaining and scheduling scheme with multi-resource fairness consideration among VNF instances. Moreover, it would also be intriguing to explore the interplay between resource fairness and other performance metrics.

References

  • [1] B. Addis, D. Belabed, M. Bouet, and S. Secci, “Virtual network functions placement and routing optimization,” in Proceedings of IEEE CloudNet, 2015.
  • [2] S. Agarwal, F. Malandrino, C.-F. Chiasserini, and S. De, “Joint vnf placement and cpu allocation in 5g,” in Proceedings of IEEE INFOCOM, 2018.
  • [3] M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity data center network architecture,” in Proceedings of ACM SIGCOMM, 2008.
  • [4] T. Benson, A. Akella, and D. A. Maltz, “Network traffic characteristics of data centers in the wild,” in Proceedings of ACM SIGCOMM, 2010.
  • [5] N. Bouten, J. Famaey, R. Mijumbi, B. Naudts, J. Serrat, S. Latré, and F. De Turck, “Towards nfv-based multimedia delivery,” in Proceedings of IFIP/IEEE International Symposium on Integrated Network Management (IM), 2015.
  • [6] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series analysis: forecasting and control.   John Wiley & Sons, 2015.
  • [7] J. Broughton, “Netflix adds download functionality,” https://technology.ihs.com/586280/netflix-adds-download-support.
  • [8] F. Callegati, W. Cerroni, C. Contoli, and G. Santandrea, “Dynamic chaining of virtual network functions in cloud-based edge networks,” in Proceedings of IEEE NetSoft, 2015.
  • [9] C. K. Chui and G. Chen, Kalman Filtering.   Springer, 2017.
  • [10] R. Cohen, L. Lewin-Eytan, J. S. Naor, and D. Raz, “Near optimal placement of virtual network functions,” in Proceedings of IEEE INFOCOM, 2015.
  • [11] R. Cziva, C. Anagnostopoulos, and D. P. Pezaros, “Dynamic, latency-optimal vnf placement at the network edge,” in Proceedings of IEEE INFOCOM, 2018.
  • [12] S. D’Oro, L. Galluccio, S. Palazzo, and G. Schembra, “Exploiting congestion games to achieve distributed service chaining in nfv networks,” IEEE JSAC, vol. 35, no. 2, pp. 407–420, 2017.
  • [13] X. Fei, F. Liu, H. Xu, and H. Jin, “Towards load-balanced vnf assignment in geo-distributed nfv infrastructure,” in Proceedings of IEEE/ACM IWQoS, 2017.
  • [14] ——, “Adaptive vnf scaling and flow routing with proactive demand prediction,” in Proceedings of IEEE INFOCOM, 2018.
  • [15] X. Gao, X. Huang, S. Bian, Z. Shao, and Y. Yang, “Pora: predictive offloading and resource allocation in dynamic fog computing systems,” in Proceedings of IEEE ICC, 2019.
  • [16] J. Halpern, C. Pignataro et al., “Service function chaining (sfc) architecture,” in RFC 7665, 2015.
  • [17] B. Han, V. Gopalakrishnan, L. Ji, and S. Lee, “Network function virtualization: challenges and opportunities for innovations,” IEEE Communications Magazine, vol. 53, no. 2, pp. 90–97, 2015.
  • [18] H. Hantouti, N. Benamar, T. Taleb, and A. Laghrissi, “Traffic steering for service function chaining,” IEEE Communications Surveys & Tutorials, vol. 21, no. 1, pp. 487–507, 2018.
  • [19] J. G. Herrera and J. F. Botero, “Resource allocation in nfv: a comprehensive survey,” IEEE Transactions on Network and Service Management, vol. 13, no. 3, pp. 518–532, 2016.
  • [20] C.-L. Hsieh and N. Weng, “Nf-switch: vnfs-enabled sdn switches for high performance service function chaining,” in Proceedings of IEEE ICNP, 2017.
  • [21] L. Huang, S. Zhang, M. Chen, and X. Liu, “When backpressure meets predictive scheduling,” IEEE/ACM Transactions on Networking (ToN), vol. 24, no. 4, pp. 2237–2250, 2016.
  • [22] X. Huang, S. Bian, Z. Shao, and Y. Yang, “Predictive switch-controller association and control devolution for sdn systems,” in Proceedings of IEEE/ACM IWQoS, 2019.
  • [23] X. Huang, Z. Shao, and Y. Yang, “Dynamic tuple scheduling with prediction for data stream processing systems,” in Proceedings of IEEE GLOBECOM, 2019.
  • [24] J. W. Jiang, T. Lan, S. Ha, M. Chen, and M. Chiang, “Joint vm placement and routing for data center traffic engineering,” in Proceedings of IEEE INFOCOM, 2012.
  • [25] S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken, “The nature of data center traffic: measurements & analysis,” in Proceedings of ACM SIGCOMM, 2009.
  • [26] G. P. Katsikas, T. Barbette, D. Kostic, R. Steinert, and G. Q. Maguire Jr, “Metron: NFV service chains at the true speed of the underlying hardware,” in Proceedings of USENIX NSDI, 2018.
  • [27] A. Laghrissi and T. Taleb, “A survey on the placement of virtual resources and virtual network functions,” IEEE Communications Surveys & Tutorials, vol. 21, no. 2, pp. 1409–1434, 2018.
  • [28] X. Li, X. Wang, F. Liu, and H. Xu, “Dhl: Enabling flexible software network functions with fpga acceleration,” in Proceedings of IEEE ICDCS, 2018.
  • [29] Y. Li and M. Chen, “Software-defined network function virtualization: a survey,” IEEE Access, vol. 3, pp. 2542–2553, 2015.
  • [30] J. D. Little, “A proof for the queuing formula: L= w,” Operations Research, vol. 9, no. 3, pp. 383–387, 1961.
  • [31] T. Lukovszki and S. Schmid, “Online admission control and embedding of service chains,” in Proceedings of SIROCCO, 2015.
  • [32] S. Mehraghdam, M. Keller, and H. Karl, “Specifying and placing chains of virtual network functions,” in Proceedings of IEEE CloudNet, 2014.
  • [33] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and R. Boutaba, “Network function virtualization: state-of-the-art and research challenges,” IEEE Communications Surveys and Tutorials, vol. 18, no. 1, pp. 236–262, 2016.
  • [34] M. Mitzenmacher, “The power of two choices in randomized load balancing,” IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 10, pp. 1094–1104, 2001.
  • [35] A. Mohammadkhan, S. Ghapani, G. Liu, W. Zhang, K. Ramakrishnan, and T. Wood, “Virtual function placement and traffic steering in flexible and dynamic software defined networks,” in Proceedings of IEEE LANMAN, 2015.
  • [36] S. Nanda, F. Zafari, C. DeCusatis, E. Wedaa, and B. Yang, “Predicting network attack patterns in sdn using machine learning approach,” in Proceedings of IEEE NFV-SDN, 2016.
  • [37] M. J. Neely, “Stochastic network optimization with application to communication and queueing systems,” Synthesis Lectures on Communication Networks, vol. 3, no. 1, pp. 1–211, 2010.
  • [38] K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, “Sparrow: distributed, low latency scheduling,” in Proceedings of ACM SOSP, 2013.
  • [39] I. S. Petrov, “Mathematical model for predicting forwarding rule counter values in sdn,” in Proceedings of IEEE EIConRus, 2018.
  • [40] C. Pham, N. H. Tran, S. Ren, W. Saad, and C. S. Hong, “Traffic-aware and energy-efficient vnf placement for service chaining: Joint sampling and matching approach,” IEEE Transactions on Services Computing, vol. 13, no. 1, pp. 172–185, 2017.
  • [41] Y. Sang, B. Ji, G. R. Gupta, X. Du, and L. Ye, “Provably efficient algorithms for joint placement and allocation of virtual network functions,” in Proceedings of IEEE INFOCOM, 2017.
  • [42] M. Savi, M. Tornatore, and G. Verticale, “Impact of processing-resource sharing on the placement of chained virtual network functions,” IEEE Transactions on Cloud Computing, 2019, early access, doi:10.1177/0163443718810910.
  • [43] S. Schneider, S. Dräxler, and H. Karl, “Trade-offs in dynamic resource allocation in network function virtualization,” in Proceedings of IEEE Globecom workshop, 2018.
  • [44] A. Sheoran, P. Sharma, S. Fahmy, and V. Saxena, “Contain-ed: an nfv micro-service system for containing e2e latency,” ACM SIGCOMM Computer Communication Review, vol. 47, no. 5, pp. 54–60, 2017.
  • [45] A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey, “Jellyfish: networking data centers, randomly.” in Proceedings of USENIX NSDI, 2012.
  • [46] S. J. Taylor and B. Letham, “Forecasting at scale,” The American Statistician, vol. 72, no. 1, pp. 37–45, 2018.
  • [47] A. Tootoonchian, A. Panda, C. Lan, M. Walls, K. Argyraki, S. Ratnasamy, and S. Shenker, “Resq: enabling slos in network function virtualization,” in Proceedings of USENIX NSDI, 2018.
  • [48] H. Wang and J. Schmitt, “Load balancing-towards balanced delay guarantees in nfv/sdn,” in Proceedings of IEEE NFV-SDN, 2016.
  • [49] L. Wang, Z. Lu, X. Wen, R. Knopp, and R. Gupta, “Joint optimization of service function chaining and resource allocation in network function virtualization,” IEEE Access, vol. 4, pp. 8084–8094, 2016.
  • [50] T. Wang, H. Xu, and F. Liu, “Multi-resource load balancing for virtual network functions,” in Proceedings of IEEE ICDCS, 2017.
  • [51] S. Woo, J. Sherry, S. Han, S. Moon, S. Ratnasamy, and S. Shenker, “Elastic scaling of stateful network functions,” in Proceedings of USENIX NSDI, 2018.
  • [52] Y. Xiao, Q. Zhang, F. Liu, J. Wang, M. Zhao, Z. Zhang, and J. Zhang, “Nfvdeep: adaptive online service function chain deployment with deep reinforcement learning,” in Proceedings of IEEE/ACM IWQoS, 2019.
  • [53] Z. Xu, F. Liu, T. Wang, and H. Xu, “Demystifying the energy efficiency of network function virtualization,” in Proceedings of IEEE/ACM IWQoS, 2016.
  • [54] L. Ying, R. Srikant, and X. Kang, “The power of slightly more than one sample in randomized load balancing,” in Proceedings of IEEE INFOCOM, 2015.
  • [55] M. S. Yoon and A. E. Kamal, “Nfv resource allocation using mixed queuing network model,” in Proceedings of IEEE GLOBECOM, 2016.
  • [56] C. You and L. Li, “Efficient load balancing for the vnf deployment with placement constraints,” in Proceedings of IEEE ICC, 2019.
  • [57] C. Zeng, F. Liu, S. Chen, W. Jiang, and M. Li, “Demystifying the performance interference of co-located virtual network functions,” in Proceedings of IEEE INFOCOM, 2018.
  • [58] J. Zhang, W. Wu, and J. C. Lui, “On the theory of function placement and chaining for network function virtualization,” in Proceedings of ACM MobiHoc, 2018.
  • [59] K. Zhang, B. He, J. Hu, Z. Wang, B. Hua, J. Meng, and L. Yang, “G-net: effective gpu sharing in nfv systems,” in Proceedings of USENIX NSDI, 2018.
  • [60] Q. Zhang, F. Liu, and C. Zeng, “Adaptive interference-aware vnf placement for service-customized 5g network slices,” in Proceedings of IEEE INFOCOM, 2019.
  • [61] Q. Zhang, Y. Xiao, F. Liu, J. C. Lui, J. Guo, and T. Wang, “Joint optimization of chain placement and request scheduling for network function virtualization,” in Proceedings of IEEE ICDCS, 2017.
  • [62] S. Zhang, L. Huang, M. Chen, and X. Liu, “Proactive serving decreases user delay exponentially,” ACM SIGMETRICS Performance Evaluation Review, vol. 43, no. 2, pp. 39–41, 2015.
  • [63] X. Zhang, C. Wu, Z. Li, and F. C. Lau, “Proactive vnf provisioning with multi-timescale cloud resources: fusing online learning and online optimization,” in Proceedings of IEEE INFOCOM, 2017.
  • [64] Y. Zhang, Z.-L. Zhang, and B. Han, “Hybridsfc: Accelerating service function chains with parallelism,” in Proceedings of IEEE NFV-SDN, 2019.

Appendix-A
Proof of Lemma 1

To solve problem (15), we adopt the Lyapunov optimization technique [37]. We define the quadratic Lyapunov function as

(20)

and the Lyapunov drift for two consecutive time slots as

(21)

which measures the conditional expected successive change in queues’ congestion state. To avoid overloading any queues in the system, it is desirable to make the difference as low as possible. However, striving for a short total queue length may incur considerable communication cost and computation cost. To jointly consider both queueing stability and the consequent system cost, we define the drift-plus-penalty function as

(22)

where is a positive constant that determines the balance between queueing stability and minimizing total system cost.

To transform problem (15) into (16), we apply 20 and 22 and we have

(23)

By (7) - (9), it follows that

1) for ,

(24)

where we define .

2) for and ,

(25)

3) for and ,

(26)

By (22) – (26) and the boundedness of request arrival numbers, service capacities on switches and controllers, and according to (2), we have for , and