Deep Q-Learning for Dynamic Reliability Aware NFV-Based Service Provisioning

12/03/2018 ∙ by Hamed Rahmani Khezri, et al. ∙ University of Tehran Nanyang Technological University 0

Network function virtualization (NFV) is referred to the technology in which softwarized network functions virtually run on commodity servers. Such functions are called virtual network functions (VNFs). A specific service is composed of a set of VNFs. This is a paradigm shift for service provisioning in telecom networks which introduces new design and implementation challenges. One of these challenges is to meet the reliability requirement of the requested services considering the reliability of the commodity servers. VNF placement which is the problem of assigning commodity servers to the VNFs becomes crucial under such circumstances. To address such an issue, in this paper, we employ Deep Q-Network (DQN) to model NFV placement problem considering the reliability requirement of the services. The output of the introduced model determines what placement will be optimal in each state. Numerical evaluations show that the introduced model can significantly improve the performance of the network operator.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Service deployment in traditional enterprise networks tightly depends on specific hardware named middlebox [1], [2]. Quality of service (QoS) monitoring tools, video transcoders, firewalls, intrusion detection systems, proxies, and deep packet inspection are examples of such middleboxes. This function implementation limits the expansion of the networks and increases CAPEX and OPEX [3]. Because of these shortcomings in using middleboxes, a fundamental change for network function implementation is inevitable. Network Function Virtualization (NFV) promises to obviate the limitation of middleboxes for deploying new network services [4], [5]. In the NFV framework, the hardware middleboxes are replaced by the modules of software named virtual network functions (VNFs) running on commodity servers [6]. To provide a network service, a set of appropriate VNFs should be sequenced in a chain called service function chain (SFC). The VNFs of a service can be deployed by launching a VM instance in any server of network infrastructure. The procedure of assigning servers to the VNFs of service is named NFV placement.

There are three main components for the NFV based network. The first one is the services which are requested by the network users. Each incoming service has a dedicated service level agreements (SLAs) which can include the required reliability, end to end delay and the other QoS parameter. The second component is Infrastructure Network Provider (InP) (InP) which is the owner of the commodity servers for running the VNFs and the links between the servers for routing the service’s traffic. The last component is the Network Operator (NO) which is responsible for responding to the incoming services according to their requested SLAs. For this purpose, NO should chain appropriate VNFs for each incoming service and then place them onto InP’s servers [1].

One of the most important challenges in NFV is the placement of incoming services in the InP. In [7], the NFV placement problem with the purpose of energy and traffic cost minimization is considered. Also, they tried to prevent resource fragmentation in the servers. In [8], NFV placement problem is considered in a way that the cost of using servers and links is minimized and the requested delay of services is met. In [9], NFV placement problem with a cost function including deployment cost, resource cost, traffic cost, delay cost, and resource fragmentation cost is considered. In [10]–[12], game theoretical models for NFV placement are considered. In [10],[11], the dynamic market mechanism design for on-demand service chain provisioning and pricing in the NFV market is studied. Authors in [12] model the selfish and competitive behavior of users in NFV with an atomic weighted congestion game is used.

In this paper, we consider a reliability-aware NFV placement problem. We would like to minimize the placement cost while maximizing the number of admitted services. A service will be in the perfect running state if all constituent VNFs of the service run in the commodity servers without failure. As a result, if only one of the servers which host the service VNFs fail, the service would be disrupted. We know that the servers in InPs can have different failure probabilities. In our work, NFV placement is carried out in an online manner. NO assigns the InP’s servers to the incoming services according to the available resources. For the problem with dynamic characteristic, the learning-based approach like Q-learning can be useful [13]. Recently, Deep Q-Network (DQN) becomes useful because of some shortcomings in the Q-learning[14]. In this paper, we consider the use of DQN for NFV placement to meet the reliability requirement of the incoming services. In introduced NFV placement problem, NO learns the optimal policy in different states. The contributions of this paper are summarized as follows.

  • We introduce an optimization problem for jointly minimizing the placement cost and maximizing the number of admitted services regarding the reliability requirement.

  • We introduce a solution based on DQN for reliability-aware NFV placement. For this purpose, we define the corresponding states and rewards of Q-Learning in an NFV framework.

  • Finally, we investigate the convergence of introduced DQN technique for NFV placement problem and evaluate the performance of this method concerning the admission ratio.

Machine Learning approaches have been used in NFV, recently. In [15], an efficient online algorithm from learning literature for dynamic placement of VNF service chains is presented. The considered objective function is operational cost minimization of the service chain provider. In [16], a machine-learning-based method for jointly optimization NFV placement and monitoring processes. In [17] by using Deep Feedforward Neural Network or Multi-Layer Perceptron (MLP), a solution for proactive identification of SLA violations is presented. Authors in [18] formulate the VNF selection and chaining problem as a Binary Integer Programming (BIP) model for end-to-end delay minimization. They propose a novel deep learning- based strategy for solving the problem.

The rest of the paper is organized as follows. We present the system model for reliable NFV placement problem in Section II. Then, we present a DQN model for reliable NFV placement problem in Section III. In Section IV, we numerically evaluate the performance of the proposed scenario. Finally, in Section V, we conclude the paper.

Ii System Model

We consider a scenario in which NO aims to deliver agile services using NFV. We assume there are multiple InPs with commodity servers that the NO can use them for placing the SFC of services. Each InP has some servers with a limited amount of resources. The main characteristic of each InP is the failure probability of its servers which is different from the other InPs. We assume that the unit server cost for each InP is dependent on the failure probability of InP.

Ii-a Infrastructure Network Providers (InPs)

Let denote the set of InPs and denote the set of servers of the InP. We indicate the number of the InPs with and the number of servers for the InP with . We assume that the entire network of InPs can be shown with an undirected graph in which indicates the set of servers and indicates the set of the links between the servers as


where indicates the server of the InP and indicates the link between the server of the InP and the server of the InP. The resource amount of the server in the InP is denoted by . The bandwidth of the link between the server in the InP and the server in the InP is denoted by . The unit cost for using servers of the InP is denoted by and the unit cost for using the link between the server of the InP and the server of the InP is denoted by .

Let indicate the failure probability of the servers of the InP. We assume that by decreasing the failure probability marginally close to zero, the unit cost for using the server is exponentially increased. Therefore, we consider an exponential model for the cost of using servers of different InP as


where and are design parameters and is the highest acceptable failure probability.

Ii-B Characteristics of Service Requests

We divide the time into slots with equal length, and at the beginning of each slot, we consider the NFV placement problem for incoming service requests. Also, we assume that each service lasts for a random number of slots. The departure probability of an existing service in a slot is , where indicates the type of service. According to this assumption, the departure probability of service in the slot is independent of the departure probability of this service in the . As a result, the number of existing services in the network in the slot is only dependant on the number of existing services in the network in the .

Let denotes the number of service types and indicates the number of requested services for the type of service in each slot. Also, the number of chained VNFs for the service type is . We indicate the required bandwidth for this type of service with and the required resource of the VNF of this service type with . It is worth noting that we consider only one resource type for a service. However the extension to the multiple resource type is straightforward. The maximum acceptable failure probability for the type of service is . Finally, we indicate the decision variable of placing the VNF of the service of type in the server of the InP in the slot with .

Ii-C Cost Function

The two main components of the cost function are server cost and link cost. Let denote the cost of using the servers in the slot. We can write as the summation of server cost for placement of each service type, as


It is worth noting that we assume the cost of using servers is a linear function of the binary decision variable, .

The second main component of the cost function is the cost of using links between servers which is denoted by . We can write the as the summation of server cost for placement of each service, as


where is used to indicate the use of the link between the server of the InP and the server of the InP, for forwarding of the traffic between and VNFs of service of type. It is worth noting that if two consecutive VNFs of a service are placed in the same server ( and ), then and there is no cost for forwarding of traffic between these VNFs. As seen in (5), this cost component is a nonlinear function of the binary decision variable, . The total cost in the slot is .

Ii-D Reliability Constraint

We indicate the failure probability for the service of the type in the slot with . The reliability constraint is . To obtain , we should calculate the probability of being in the running state (i.e., not being failed) for this service, . We know that a service is in running state if all VNFs of service are not failed. As a result, we should determine the failure probability of a VNF which is a function of the binary decision variable, . Let denote the failure probability of the VNF for the service of the type in the slot. This probability is calculated as


where is the failure probability of the InP. According to (6), the failure probability for the VNF of the service of the type in the slot is the multiplication of the failure probability of the VNF in all InPs. Also, the failure probability of a VNF in each InP is the multiplication of failure probability of all the servers in which the respective VNF is placed. We assume failure events in different InPs and also in different servers of an InP are independent. We calculate the probability of being in the running state for the service of type in the slot as . Finally, we can calculate the failure probability for the service of type in the slot as .

Ii-E Minimum Cost NFV Placement

In this part, we want to formulate the objective of NO throughout the time as an optimization problem. We assume that the purpose of NO is minimizing the placement cost regarding the reliability requirement of incoming services and InP’s resource constraint. Thus, the optimization problem can be written as


where indicates the remaining resource for the server of the InP in the slot and is the remaining bandwidth for the link between the server of the InP and the server of the InP, in the slot. Constraint in (8) indicates that each VNF is instantiated once. Constraint in (9) guarantees the reliability requirement of each service. The constraint in (10) makes sure that the resource capacity of each server is not violated in each slot. The constraint in (11) guarantees that the bandwidth capacity of each link is not violated in each slot.

The optimization problem in (8)-(12) is intractable for large networks with various services. Learning based techniques can be helpful to solve such problem. The goal of the learning technique is to learn a policy which determines what action to take in each state. In the following, we introduce a model based on DQN for NFV placement problem regarding the reliability requirement.


Loss and Gradient

Parameter Updating




Replay Memory




Fig. 1: An overview of a Deep Q-Learning Neural Network (DQN).

Iii DQN Model for NFV Placement

In this section, we introduce a model based on DQN for NFV placement considering the reliability requirement of the incoming services. First of all, we review Q-Learning and motivation for the combination of Q-Learning and Deep Neural Network (DNN). Then, we introduce a DQN model for NFV placement regarding the reliability.

Iii-a DQN Background

In Reinforcement Learning, there are some agents who explore and exploit the environment based on the reward gained from an environment and the state which encapsulate all features and conditions by using a particular policy. The policy is used to make a balance between the exploration and the exploitation of agents. Rewards are the direct consequence of actions made by the agent in each state. Despite all the merits provided by Q-learning, its weak point lies in decision making in the problems where states are covering a wide range of possibilities and Q-tables are large. DQN is a combination of both neural networks and Q-learning approaches. DQN uses the same model but instead of updating the Q-table which is hard to be searched in environments with big states space, it trains a DNN while it explores and exploits. By making each action, the reward gained by the agent is used to conduct the back-propagation process and update the weights of neural networks. The input of the neural network is a vector representing the state, and the possible actions are the output neurons of the neural network which are selected by the agent based on a policy.

The general overview of the DQN-agent used in this approach is shown in Fig. 1. The states are given as inputs of the DNN, and all possible actions are at the output of the neural network. The chosen actions based on the policy affects the NFV network. The environment returns the direct consequence as specific rewards to the memory. The states, next states, chosen actions, and rewards of all slots are stored in the memory. A mini-batch is randomly sampled from memory for updating the weights of the neural network.

Iii-B Modeling NFV Placement with DQN

For DQN problem, we characterize a four-tuple including state set, action set, reward set and memory set. We show these four-tuple with where is the state set, is the action set, is the reward set and

is the memory segment. For NFV placement, we take a new approach towards defining the states. In large-scale problems, choosing states in a way that represents our demands to the network is crucial. Our most prior goal is to satisfy the requested reliability of each service in each slot. Thus, the trained system should discriminate between services with variate reliability requirements. Our DQN agent should also be aware of the available resources that each InP can provide at the moment of decision for each incoming service placement. It is worth noting that discrimination among services should be considered by the DQN agent, as resources demanded among two services may differ and its the networks duty to choose the best corresponding VNFs to satisfy these resource demands.

The decision for selecting the InPs to allocate resources to an incoming service should be made by considering the amount of resource needed while taking available resources distributed among InPs and the required reliability in mind. Combination of all these prerequisites generates a complex and large space state that our DQN agent should be able to comprehend through learning and iteration.

We define states as a vector of available resources provided by InPs concatenated with the resources demanded by a service, and the requested reliability that should be satisfied. As a result, the state set of DQN agent can be written as


where indicates the remaining resources for the server of the InP, denotes the demand resource for the VNF of the service type and is the reliability requirement of the service type. It is worth noting that the resource demand for each service is considered in the state to highlight the characteristics of the incoming services for the learning agent.
Let indicate the maximum possible number of service requests in each slot. is chosen according to the possible resource budget of the InPs. In DQN modeling of NFV placement, we assume that in each slot, the NO considers the placement of service requests one by one. More precisely, the placement of the first incoming service is determined, the DQN states are updated, then the placement of the second incoming service is determined and so on. We define the action as the possible placement policies for each VNF of an incoming service which can be written as where indicates the InP index and denotes the server index in the respective InP. The learning agent uses the DQN outputs to determine the InP and server index for all VNFs of a service. We consider outputs for the DQN as in which indicates the q-value for assigning the corresponding server of the output to a VNF of the considered service. The value of depends on the number of InP, number of servers of each InP, resource budget of each server and the maximum value for the resource demands of a VNF for all service types. For a service with VNFs, NO determines the placement using . The optimal solution for a service with VNFs is to select servers with the highest Q-values. However, at the beginning, the agent has no sense about the optimal solution. As a result, the learning agent needs to consider all possible actions. In the RL, gives the agent a chance of exploration.

Due to the quiddity of the Q-Learning and particularly DQN, defining the best reward and cost functions play a vital role. The most important task for a NO in the placement of the services is to meet the requested reliability. We outline a penalty for a situation in which the reliability requirement is not satisfied. On the other hand, we assign a reward for the successful placement of a service. However, due to the limited resource budget of InPs, the agent should comprehend to provide reliability as close as possible to the requested reliability. This proximity should be implemented in the structure of reliability rewards. Because of the nature of the learning system, it is possible that the selected server for hosting a VNF does not have enough resource. We outline a penalty for situations in which resource allocation is failed due to the lack of enough resource in the selected server. Finally, we consider the placement cost of the allocated resources for an incoming service as a penalty term. Now, the placement reward , for a service with type , is written as


where indicates the output action of the DQN in which and denote the allocated InPs and server index for the VNF of the service, is the maximum acceptable failure probability of the service type, denotes the failure probability of placement. The value of the is a penalty for not admitting the service because of the reliability, and is a penalty for violating the resource budget of the selected servers. is the placement cost including the server and link costs. Finally, is a reward for the successful placement of the service. According to (16), we consider more reward for the placement in which the placement reliability is in the proximity of requested reliability.

We defined a structure as a memory for the agent so that through the learning, the agent would not tend to adapt to a sequence of services and be able to work on a random batch of the services and try to improve the previous acquired results based on the new knowledge gained through experience [14]. For this purpose, we use the current state, action, reward and next state as a memory segment of the DQN agent for each service after its allocation. When the number of considered services reaches a threshold, a random batch of memory segments is selected for updating weights of the neural network.

Iv Numerical Results

In this section, we evaluate the performance of the proposed scheme regarding the total placement cost and the admission ratio. For the simulation framework of the DQN, we used Keras and TensorFlow in Python. For the InPs, we consider seven InPs with different radiabilities

. We assume each InP has five servers with the same reliability level. For each server, we consider the capacity of 100 units of one resource type. We consider five service types. The service type requested reliability is assumed among , according to the SLA requirement of Google Apps [19]. Also, we assume that the number of VNFs in each service type is between three to five VNFs and the resource demand of the VNFs is considered to be between 10 and 20 units. We assume that the departure probability for all service types is equal and between to

. For the DQN network, we use a fully connected DNN, which involves hyperbolic tangent and ReLu as activation function in the middle layer, and the output layer is connected to a linear activation function [20]. Each layer is associated with Dropout, with its parameter set between 0.05 and 0.2, so that overfitting is prohibited [21]. Also, we use the mean square error (MSE) metric for error function.

Fig. 2: Admission ratio for different service types during the learning.
Fig. 3: Admission ratio for different service types and departure probabilities.

For evaluating the performance of the proposed DQN-agent, we consider the admission ratio for different service types. The admission ratio for each service type is defined as the proportion of the successfully accepted services regarding the reliability requirements to the number of incoming services. The trend of learning and adaptation of the DQN-agent for admitting services with different reliability requirements through time steps is shown in Fig. 2. The x-axis shows the time steps which each consists of 10000 slots. The y-axis indicates the admission ratio for different reliability requirement. The agent policy is greedy which initiates by the value of 0.2 to strengthen the aspect of exploration in initial states of learning. As time goes by, our agent tends to be more exploitative rather than being explorative due to our decaying , which results in consistent values. For high-reliability requirements, the convergence time of the agent increases and the value which each service type converges to is decreased.

For evaluating the robustness of the proposed DQN method for dynamic reliability-aware NFV placement, we consider the performance of the DQN-agent under different departure probabilities. For this purpose, we use the optimal policy for placement of the incoming services. The resulted admission ratio for different values of the departure probabilities for a fixed resource amount of the InPs is shown in Fig. 3. As seen in Fig. 3, with a decrease in the value of the departure probability, the admission ratio is decreased.

V Conclusion

In this paper, we considered a dynamic reliability-aware NFV placement for NFV-enabled NO using DQN. For this purpose, we considered a multi-InP scenario in which different levels of reliability with different costs are offered to the NO. On the other hand, we considered multiple service types for the incoming services which introduced by their reliability requirement. Also, we assumed that admitted services would be ended in each slot with a departure probability which can be different for various service types. For DQN-agent, we defined the state set, action set, reward and memory considering the objective of the NO which is maximizing the admission ratio while minimizing the placement cost. Using simulations, we showed that the NO could learn how to effectively use the resources of the InPs for various service types in different states in a way that the admission ratio is maximized and placement cost is minimized.

Vi References

  1. J. G. Herrera and J. F. Botero, “Resource allocation in NFV: A comprehensive survey,” IEEE Trans. on Network and Service Management, vol. 13, no. 3, pp. 518–532, 2016.

  2. S. Khebbache, M. Hadji, and D. Zeghlache, “Virtualized network functions chaining and routing algorithms,” Computer Networks, vol. 114, pp. 95–110, 2017.

  3. J. Sherry, S. Ratnasamy, and J. S. At, “A survey of enterprise middlebox deployments,” Technical Report UCB/EECS-2012-24, EECS Department, University of California, Berkeley, 2012.

  4. X. Zhang, C. Wu, Z. Li, and F. C. Lau, “Proactive VNF provisioning with multi-timescale cloud resources: Fusing online learning and online optimization,” in IEEE INFOCOM, Atlanta, GA, May. 2017.

  5. G. NFV, “Network functions virtualisation NFV; architectural framework,” NFV ISG, Oct. 2013.

  6. R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and R. Boutaba, “Network function virtualization: State-of-the-art and research challenges,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 236–262, 2016.

  7. C. Pham, N. H. Tran, S. Ren, W. Saad, and C. S. Hong, “Traffic-aware and energy-efficient VNF placement for service chaining: Joint sampling and matching approach,” IEEE Trans. on Services Computing, 2017.

  8. M. Mechtri, C. Ghribi, and D. Zeghlache, “A scalable algorithm for the placement of service function chains,” IEEE Trans. on Network and Service Management, vol. 13, no. 3, pp. 533–546, 2016.

  9. F. Bari, S. R. Chowdhury, R. Ahmed, R. Boutaba, and O. C. M. B. Duarte, “Orchestrating virtualized network functions,” IEEE Trans. on Network and Service Management, vol. 13, no. 4, pp. 725–739, 2016.

  10. S. Gu, Z. Li, C. Wu, and C. Huang, “An efficient auction mechanism for service chains in the NFV market,” in IEEE INFOCOM, San Fransisco, CA, May. 2016.

  11. X. Zhang, Z. Huang, C. Wu, Z. Li, and F. C. Lau, “Online stochastic buy-sell mechanism for VNF chains in the NFV market,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 2, pp. 392–406, 2017.

  12. S. D’Oro, L. Galluccio, S. Palazzo, and G. Schembra, “Exploiting congestion games to achieve distributed service chaining in NFV networks,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 2, pp. 407–420, 2017.

  13. C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D. dissertation, King’s College, Cambridge, UK, May, 1989.

  14. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.

  15. Y. Jia, C. Wu, Z. Li, F. Le, A. Liu, Z. Li, Y. Jia, C. Wu, F. Le, and A. Liu, “Online scaling of NFV service chains across geo-distributed datacenters,” IEEE/ACM Trans. on Networking (TON), vol. 26, no. 2, pp. 699–710, 2018.

  16. V. Sciancalepore, F. Z. Yousaf, and X. Costa-Perez, “z-TORCH: An automated NFV orchestration and monitoring solution,”

    IEEE Trans. on Network and Service Management, 2018.

  17. J. Bendriss, I. G. B. Yahia, R. Riggio, and D. Zeghlache, “A deep learning based sla management for NFV-based services,” in Conference on ICIN, Paris, France, Feb. 2018.

  18. J. Pei, P. Hong, and D. Li, “Virtual network function selection and chaining based on deep learning in sdn and NFV-enabled networks,” in ICC Workshops, Kansas City, MO, USA, May. 2018.

  19. “Google apps service level agreement,” [Online]. Available:

  20. A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML, vol. 30, no. 1, 2013, p. 3.

  21. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.