Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

12/08/2016
by   Yichen Chen, et al.
0

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity per iteration. The SPD methods find an absolute-ϵ-optimal policy, with high probability, using O(|S|^4 |A|^2σ^2 /(1-γ)^6ϵ^2) iterations/samples for the infinite-horizon discounted-reward MDP and O(|S|^4 |A|^2H^6σ^2 /ϵ^2) for the finite-horizon MDP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2017

Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

Consider the problem of approximating the optimal policy of a Markov dec...
research
07/13/2018

On the Complexity of Value Iteration

Value iteration is a fundamental algorithm for solving Markov Decision P...
research
06/13/2021

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

We prove new upper and lower bounds for sample complexity of finding an ...
research
10/01/2022

Primal-dual regression approach for Markov decision processes with general state and action space

We develop a regression based primal-dual martingale approach for solvin...
research
09/21/2022

First-order Policy Optimization for Robust Markov Decision Process

We consider the problem of solving robust Markov decision process (MDP),...
research
02/23/2020

Periodic Q-Learning

The use of target networks is a common practice in deep reinforcement le...
research
04/03/2018

Renewal Monte Carlo: Renewal theory based reinforcement learning

In this paper, we present an online reinforcement learning algorithm, ca...

Please sign up or login with your details

Forgot password? Click here to reset