Primal-dual regression approach for Markov decision processes with general state and action space

10/01/2022
by   Denis Belomestny, et al.
0

We develop a regression based primal-dual martingale approach for solving finite time horizon MDPs with general state and action space. As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy. In particular, we prove tight error bounds for the estimated duality gap featuring polynomial dependence on the time horizon, and sublinear dependence on the cardinality/dimension of the possibly infinite state and action space.From a computational point of view the proposed method is efficient since, in contrast to usual duality-based methods for optimal control problems in the literature, the Monte Carlo procedures here involved do not require nested simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2017

Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

Consider the problem of approximating the optimal policy of a Markov dec...
research
06/06/2022

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

We study sequential decision making problems aimed at maximizing the exp...
research
12/08/2016

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

We study the online estimation of the optimal policy of a Markov decisio...
research
06/20/2023

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

We study the problem of computing an optimal policy of an infinite-horiz...
research
09/23/2020

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Constrained Markov Decision Processes (CMDPs) formalize sequential decis...
research
03/15/2019

On Sample Complexity of Projection-Free Primal-Dual Methods for Learning Mixture Policies in Markov Decision Processes

We study the problem of learning policy of an infinite-horizon, discount...
research
07/13/2021

Lifting the Convex Conjugate in Lagrangian Relaxations: A Tractable Approach for Continuous Markov Random Fields

Dual decomposition approaches in nonconvex optimization may suffer from ...

Please sign up or login with your details

Forgot password? Click here to reset