Large-Scale Markov Decision Problems via the Linear Programming Dual

01/06/2019
by   Yasin Abbasi-Yadkori, et al.
0

We consider the problem of controlling a fully specified Markov decision process (MDP), also known as the planning problem, when the state space is very large and calculating the optimal policy is intractable. Instead, we pursue the more modest goal of optimizing over some small family of policies. Specifically, we show that the family of policies associated with a low-dimensional approximation of occupancy measures yields a tractable optimization. Moreover, we propose an efficient algorithm, scaling with the size of the subspace but not the state space, that is able to find a policy with low excess loss relative to the best policy in this class. To the best of our knowledge, such results did not exist in the literature previously. We bound excess loss in the average cost and discounted cost cases, which are treated separately. Preliminary experiments show the effectiveness of the proposed algorithms in a queueing application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2014

Linear Programming for Large-Scale Markov Decision Problems

We consider the problem of controlling a Markov decision process (MDP) w...
research
01/02/2021

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Many real-world applications, such as those in medical domains, recommen...
research
02/26/2018

Optimizing over a Restricted Policy Class in Markov Decision Processes

We address the problem of finding an optimal policy in a Markov decision...
research
03/31/2022

Attack Impact Evaluation by Exact Convexification through State Space Augmentation

We address the attack impact evaluation problem for control system secur...
research
10/15/2020

Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes

In the theory of Partially Observed Markov Decision Processes (POMDPs), ...
research
01/10/2013

A Tractable POMDP for a Class of Sequencing Problems

We consider a partially observable Markov decision problem (POMDP) that ...
research
08/03/2021

Energy Management in Data Centers with Server Setup Delay: A Semi-MDP Approximation

The energy management schemes in multi-server data centers with setup ti...

Please sign up or login with your details

Forgot password? Click here to reset