Policy Optimization for Stochastic Shortest Path

02/07/2022
by   Liyu Chen, et al.
0

Policy optimization is among the most popular and successful reinforcement learning algorithms, and there is increasing interest in understanding its theoretical guarantees. In this work, we initiate the study of policy optimization for the stochastic shortest path (SSP) problem, a goal-oriented reinforcement learning model that strictly generalizes the finite-horizon model and better captures many applications. We consider a wide range of settings, including stochastic and adversarial environments under full information or bandit feedback, and propose a policy optimization algorithm for each setting that makes use of novel correction terms and/or variants of dilated bonuses (Luo et al., 2021). For most settings, our algorithm is shown to achieve a near-optimal regret bound. One key technical contribution of this work is a new approximation scheme to tackle SSP problems that we call stacked discounted approximation and use in all our proposed algorithms. Unlike the finite-horizon approximation that is heavily used in recent SSP algorithms, our new approximation enables us to learn a near-stationary policy with only logarithmic changes during an episode and could lead to an exponential improvement in space complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

Minimax Regret for Stochastic Shortest Path

We study the Stochastic Shortest Path (SSP) problem in which an agent ha...
research
05/25/2022

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

We initiate the study of dynamic regret minimization for goal-oriented r...
research
02/10/2021

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case

We make significant progress toward the stochastic shortest path problem...
research
12/18/2021

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

We introduce two new no-regret algorithms for the stochastic shortest pa...
research
06/15/2021

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

We introduce a generic template for developing regret minimization algor...
research
05/23/2019

Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment

We focus on the important problem of emergency evacuation, which clearly...
research
04/10/2022

A Fully Polynomial Time Approximation Scheme for Fixed-Horizon Constrained Stochastic Shortest Path Problem under Local Transitions

The fixed-horizon constrained stochastic shortest path problem (C-SSP) i...

Please sign up or login with your details

Forgot password? Click here to reset