Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

10/17/2017
by   Mengdi Wang, et al.
0

Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions. In contrast to existing reinforcement learning methods that are based on successive approximations to the nonlinear Bellman equation, we propose a Primal-Dual π Learning method in light of the linear duality between the value and policy. The π learning method is model-free and makes primal-dual updates to the policy and value vectors as new data are revealed. For infinite-horizon undiscounted Markov decision process with finite state space S and finite action space A, the π learning method finds an ϵ-optimal policy using the following number of sample transitions Õ( (τ· t^*_mix)^2 |S| |A| /ϵ^2 ), where t^*_mix is an upper bound of mixing times across all policies and τ is a parameter characterizing the range of stationary distributions across policies. The π learning method also applies to the computational problem of MDP where the transition probabilities and rewards are explicitly given as the input. In the case where each state transition can be sampled in Õ(1) time, the π learning method gives a sublinear-time algorithm for solving the averaged-reward MDP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2016

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decisio...
research
01/16/2013

PEGASUS: A Policy Search Method for Large MDPs and POMDPs

We propose a new approach to the problem of searching a space of policie...
research
03/15/2019

On Sample Complexity of Projection-Free Primal-Dual Methods for Learning Mixture Policies in Markov Decision Processes

We study the problem of learning policy of an infinite-horizon, discount...
research
10/01/2022

Primal-dual regression approach for Markov decision processes with general state and action space

We develop a regression based primal-dual martingale approach for solvin...
research
04/27/2018

Scalable Bilinear π Learning Using State and Action Features

Approximate linear programming (ALP) represents one of the major algorit...
research
12/15/2020

An exact solution in Markov decision process with multiplicative rewards as a general framework

We develop an exactly solvable framework of Markov decision process with...
research
02/27/2014

Linear Programming for Large-Scale Markov Decision Problems

We consider the problem of controlling a Markov decision process (MDP) w...

Please sign up or login with your details

Forgot password? Click here to reset