A Structure-aware Online Learning Algorithm for Markov Decision Processes

11/28/2018
by   Arghyadip Roy, et al.
0

To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider an infinite-horizon average reward MDP problem and prove the optimality of the threshold policy under certain conditions. Traditional RL techniques do not exploit the threshold nature of optimal policy while learning. In this paper, we propose a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible policy space. We establish that the proposed algorithm converges to the optimal policy. It provides a significant improvement in convergence speed and computational and storage complexity over traditional RL algorithms. The proposed technique can be applied to a wide variety of optimization problems that include energy efficient data transmission and management of queues. We exhibit the improvement in convergence speed of the proposed algorithm over other RL algorithms through simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2019

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

Markov Decision Process (MDP) problems can be solved using Dynamic Progr...
research
01/07/2022

Mirror Learning: A Unifying Framework of Policy Optimisation

General policy improvement (GPI) and trust-region learning (TRL) are the...
research
01/31/2022

Reinforcement Learning with Heterogeneous Data: Estimation and Inference

Reinforcement Learning (RL) has the promise of providing data-driven sup...
research
03/29/2020

Optimizing Coordinated Vehicle Platooning: An Analytical Approach Based on Stochastic Dynamic Programming

Platooning connected and autonomous vehicles (CAVs) can improve traffic ...
research
08/17/2023

Controlling Federated Learning for Covertness

A learner aims to minimize a function f by repeatedly querying a distrib...
research
06/09/2022

An Optimization Method-Assisted Ensemble Deep Reinforcement Learning Algorithm to Solve Unit Commitment Problems

Unit commitment (UC) is a fundamental problem in the day-ahead electrici...
research
04/03/2018

Renewal Monte Carlo: Renewal theory based reinforcement learning

In this paper, we present an online reinforcement learning algorithm, ca...

Please sign up or login with your details

Forgot password? Click here to reset