Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

12/21/2019
by   Arghyadip Roy, et al.
38

Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods which suffer from the curse of dimensionality and the curse of modeling. To overcome these issues, Reinforcement Learning (RL) methods are adopted in practice. In this paper, we aim to obtain the optimal admission control policy in a system where different classes of customers are present. Using DP techniques, we prove that it is optimal to admit the i th class of customers only upto a threshold τ(i) which is a non-increasing function of i. Contrary to traditional RL algorithms which do not take into account the structural properties of the optimal policy while learning, we propose a structure-aware learning algorithm which exploits the threshold structure of the optimal policy. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction in the policy space, the structure-aware learning algorithm provides remarkable improvements in storage and computational complexities over classical RL algorithms. Simulation results also establish the gain in the convergence rate of the proposed algorithm over other RL algorithms. The techniques presented in the paper can be applied to any general MDP problem covering various applications such as inventory management, financial planning and communication networking.

READ FULL TEXT
research
11/28/2018

A Structure-aware Online Learning Algorithm for Markov Decision Processes

To overcome the curse of dimensionality and curse of modeling in Dynamic...
research
05/03/2017

Answer Set Programming for Non-Stationary Markov Decision Processes

Non-stationary domains, where unforeseen changes happen, present a chall...
research
02/20/2021

Importance of Environment Design in Reinforcement Learning: A Study of a Robotic Environment

An in-depth understanding of the particular environment is crucial in re...
research
01/10/2021

Learning Augmented Index Policy for Optimal Service Placement at the Network Edge

We consider the problem of service placement at the network edge, in whi...
research
10/18/2019

On Connections between Constrained Optimization and Reinforcement Learning

Dynamic Programming (DP) provides standard algorithms to solve Markov De...
research
03/16/2023

Recommending the optimal policy by learning to act from temporal data

Prescriptive Process Monitoring is a prominent problem in Process Mining...
research
03/24/2023

Sequential Knockoffs for Variable Selection in Reinforcement Learning

In real-world applications of reinforcement learning, it is often challe...

Please sign up or login with your details

Forgot password? Click here to reset