Efficiently Solving MDPs with Stochastic Mirror Descent

08/28/2020
by   Yujia Jin, et al.
0

We present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with A_tot total state-action pairs and mixing time bound t_mix our method computes an ϵ-optimal policy with an expected O(t_mix^2 A_totϵ^-2) samples from the state-transition matrix, removing the ergodicity dependence of prior art. When applied to a γ-discounted MDP with A_tot total state-action pairs our method computes an ϵ-optimal policy with an expected O((1-γ)^-4 A_totϵ^-2) samples, matching the previous state-of-the-art up to a (1-γ)^-1 factor. Both methods are model-free, update state values and policies simultaneously, and run in time linear in the number of samples taken. We achieve these results through a more general stochastic mirror descent framework for solving bilinear saddle-point problems with simplex and box domains and we demonstrate the flexibility of this framework by providing further applications to constrained MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2021

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

We prove new upper and lower bounds for sample complexity of finding an ...
research
01/18/2021

Buying Data Over Time: Approximately Optimal Strategies for Dynamic Data-Driven Decisions

We consider a model where an agent has a repeated decision to make and w...
research
02/27/2021

Parallel Stochastic Mirror Descent for MDPs

We consider the problem of learning the optimal policy for infinite-hori...
research
04/27/2018

Scalable Bilinear π Learning Using State and Action Features

Approximate linear programming (ALP) represents one of the major algorit...
research
10/21/2022

Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

We propose a new stochastic primal-dual optimization algorithm for plann...
research
09/02/2021

Optimal Path Planning of Autonomous Marine Vehicles in Stochastic Dynamic Ocean Flows using a GPU-Accelerated Algorithm

Autonomous marine vehicles play an essential role in many ocean science ...
research
08/29/2019

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

In this paper, we settle the sampling complexity of solving discounted t...

Please sign up or login with your details

Forgot password? Click here to reset