Linear Programming for Large-Scale Markov Decision Problems

02/27/2014
by   Yasin Abbasi-Yadkori, et al.
0

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost. Since it is intractable to compete with the optimal policy for large scale problems, we pursue the more modest goal of competing with a low-dimensional family of policies. We use the dual linear programming formulation of the MDP average cost problem, in which the variable is a stationary distribution over state-action pairs, and we consider a neighborhood of a low-dimensional subset of the set of stationary distributions (defined in terms of state-action features) as the comparison class. We propose two techniques, one based on stochastic convex optimization, and one based on constraint sampling. In both cases, we give bounds that show that the performance of our algorithms approaches the best achievable by any policy in the comparison class. Most importantly, these results depend on the size of the comparison class, but not on the size of the state space. Preliminary experiments show the effectiveness of the proposed algorithms in a queuing application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2019

Large-Scale Markov Decision Problems via the Linear Programming Dual

We consider the problem of controlling a fully specified Markov decision...
research
10/17/2017

Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

Consider the problem of approximating the optimal policy of a Markov dec...
research
03/31/2022

Attack Impact Evaluation by Exact Convexification through State Space Augmentation

We address the attack impact evaluation problem for control system secur...
research
01/21/2017

Learning Policies for Markov Decision Processes from Data

We consider the problem of learning a policy for a Markov decision proce...
research
08/03/2021

Energy Management in Data Centers with Server Setup Delay: A Semi-MDP Approximation

The energy management schemes in multi-server data centers with setup ti...
research
04/27/2018

Scalable Bilinear π Learning Using State and Action Features

Approximate linear programming (ALP) represents one of the major algorit...
research
05/10/2019

Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

We consider a stochastic inventory control problem under censored demand...

Please sign up or login with your details

Forgot password? Click here to reset