DeepAI
Log In Sign Up

Efficiently Solving MDPs with Stochastic Mirror Descent

08/28/2020
by   Yujia Jin, et al.
0

We present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with A_tot total state-action pairs and mixing time bound t_mix our method computes an ϵ-optimal policy with an expected O(t_mix^2 A_totϵ^-2) samples from the state-transition matrix, removing the ergodicity dependence of prior art. When applied to a γ-discounted MDP with A_tot total state-action pairs our method computes an ϵ-optimal policy with an expected O((1-γ)^-4 A_totϵ^-2) samples, matching the previous state-of-the-art up to a (1-γ)^-1 factor. Both methods are model-free, update state values and policies simultaneously, and run in time linear in the number of samples taken. We achieve these results through a more general stochastic mirror descent framework for solving bilinear saddle-point problems with simplex and box domains and we demonstrate the flexibility of this framework by providing further applications to constrained MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/13/2021

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

We prove new upper and lower bounds for sample complexity of finding an ...
01/18/2021

Buying Data Over Time: Approximately Optimal Strategies for Dynamic Data-Driven Decisions

We consider a model where an agent has a repeated decision to make and w...
02/27/2021

Parallel Stochastic Mirror Descent for MDPs

We consider the problem of learning the optimal policy for infinite-hori...
04/27/2018

Scalable Bilinear π Learning Using State and Action Features

Approximate linear programming (ALP) represents one of the major algorit...
10/21/2022

Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

We propose a new stochastic primal-dual optimization algorithm for plann...
08/29/2019

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

In this paper, we settle the sampling complexity of solving discounted t...