Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

04/22/2021
by   Jean Tarbouriech, et al.
0

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate O(B_⋆√(S A K)), where K is the number of episodes, S is the number of states, A is the number of actions and B_⋆ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of B_⋆, nor of T_⋆ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of T_⋆ is available) where the regret only contains a logarithmic dependence on T_⋆, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

Minimax Regret for Stochastic Shortest Path

We study the Stochastic Shortest Path (SSP) problem in which an agent ha...
research
10/10/2022

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

We study the sample complexity of learning an ϵ-optimal policy in the St...
research
12/07/2019

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Many popular reinforcement learning problems (e.g., navigation in a maze...
research
12/21/2021

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

Policy optimization methods are one of the most widely used classes of R...
research
12/07/2020

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

We study the stochastic shortest path problem with adversarial costs and...
research
06/15/2021

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

We introduce a generic template for developing regret minimization algor...
research
05/22/2022

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

We revisit the incremental autonomous exploration problem proposed by Li...

Please sign up or login with your details

Forgot password? Click here to reset