Faster saddle-point optimization for solving large-scale Markov decision processes

09/22/2019
by   Joan Bas-Serrano, et al.
0

We consider the problem of computing optimal policies in average-reward Markov decision processes. This classical problem can be formulated as a linear program directly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the number of states. To address this issue, recent work has considered a linearly relaxed version of the resulting saddle-point problem. Our work aims at achieving a better understanding of this relaxed optimization problem by characterizing the conditions necessary for convergence to the optimal policy, and designing an optimization algorithm enjoying fast convergence rates that are independent of the size of the state space. Notably, our characterization points out some potential issues with previous work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

Rate-Optimal Policy Optimization for Linear Markov Decision Processes

We study regret minimization in online episodic linear Markov Decision P...
research
05/27/2022

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

Reward optimization in fully observable Markov decision processes is equ...
research
06/05/2021

Navigating to the Best Policy in Markov Decision Processes

We investigate the classical active pure exploration problem in Markov D...
research
05/11/2022

Stochastic first-order methods for average-reward Markov decision processes

We study the problem of average-reward Markov decision processes (AMDPs)...
research
06/29/2021

Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes

In this work we present a novel approach to hierarchical reinforcement l...
research
01/04/2019

Solving Markov Decision Processes with Reachability Characterization from Mean First Passage Times

A new mechanism for efficiently solving the Markov decision processes (M...
research
01/19/2022

Critic Algorithms using Cooperative Networks

An algorithm is proposed for policy evaluation in Markov Decision Proces...

Please sign up or login with your details

Forgot password? Click here to reset