Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

by   Jean Tarbouriech, et al.

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate O(B_⋆√(S A K)), where K is the number of episodes, S is the number of states, A is the number of actions and B_⋆ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of B_⋆, nor of T_⋆ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of T_⋆ is available) where the regret only contains a logarithmic dependence on T_⋆, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.


page 1

page 2

page 3

page 4


Minimax Regret for Stochastic Shortest Path

We study the Stochastic Shortest Path (SSP) problem in which an agent ha...

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

We study the sample complexity of learning an ϵ-optimal policy in the St...

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Many popular reinforcement learning problems (e.g., navigation in a maze...

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

Policy optimization methods are one of the most widely used classes of R...

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

We study the stochastic shortest path problem with adversarial costs and...

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

We introduce a generic template for developing regret minimization algor...

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

We revisit the incremental autonomous exploration problem proposed by Li...

Please sign up or login with your details

Forgot password? Click here to reset