Multi-Provider NFV Network Service Delegation via Average Reward Reinforcement Learning

12/24/2021
by   Bahador Bakhshi, et al.
0

In multi-provider 5G/6G networks, service delegation enables administrative domains to federate in provisioning NFV network services. Admission control is fundamental in selecting the appropriate deployment domain to maximize average profit without prior knowledge of service requests' statistical distributions. This paper analyzes a general federation contract model for service delegation in various ways. First, under the assumption of known system dynamics, we obtain the theoretically optimal performance bound by formulating the admission control problem as an infinite-horizon Markov decision process (MDP) and solving it through dynamic programming. Second, we apply reinforcement learning to practically tackle the problem when the arrival and departure rates are not known. As Q-learning maximizes the discounted rewards, we prove it is not an efficient solution due to its sensitivity to the discount factor. Then, we propose the average reward reinforcement learning approach (R-Learning) to find the policy that directly maximizes the average profit. Finally, we evaluate different solutions through extensive simulations and experimentally using the 5Growth platform. Results confirm that the proposed R-Learning solution always outperforms Q-Learning and the greedy policies. Furthermore, while there is at most 9 the MDP solution in the experimental assessment.

READ FULL TEXT

page 1

page 9

research
03/04/2021

R-Learning Based Admission Control for Service Federation in Multi-domain 5G Networks

Service federation in 5G/B5G networks enables service providers to orche...
research
03/07/2019

Can Sophisticated Dispatching Strategy Acquired by Reinforcement Learning? - A Case Study in Dynamic Courier Dispatching System

In this paper, we study a courier dispatching problem (CDP) raised from ...
research
07/17/2018

Multi-Tenant Cross-Slice Resource Orchestration: A Deep Reinforcement Learning Approach

In a software-defined radio access network (RAN), a major challenge lies...
research
06/01/2021

Reward is enough for convex MDPs

Maximising a cumulative reward function that is Markov and stationary, i...
research
10/23/2019

Learning Q-network for Active Information Acquisition

In this paper, we propose a novel Reinforcement Learning approach for so...
research
04/02/2020

Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications

Although in recent years reinforcement learning has become very popular ...
research
09/11/2011

Decision-Theoretic Planning with non-Markovian Rewards

A decision process in which rewards depend on history rather than merely...

Please sign up or login with your details

Forgot password? Click here to reset