Minimax Regret for Stochastic Shortest Path

03/24/2021
by   Alon Cohen, et al.
15

We study the Stochastic Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent has no prior knowledge about the costs and dynamics of the model. She repeatedly interacts with the model for K episodes, and has to learn to approximate the optimal policy as closely as possible. In this work we show that the minimax regret for this setting is O(B_⋆√(|S| |A| K)) where B_⋆ is a bound on the expected cost of the optimal policy from any state, S is the state space, and A is the action space. This matches the lower bound of Rosenberg et al. (2020) up to logarithmic factors, and improves their regret bound by a factor of √(|S|). Our algorithm runs in polynomial-time per episode, and is based on a novel reduction to reinforcement learning in finite-horizon MDPs. To that end, we provide an algorithm for the finite-horizon setting whose leading term in the regret depends only logarithmically on the horizon, yielding the same regret guarantees for SSP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

We study the problem of learning in the stochastic shortest path (SSP) s...
research
02/23/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Stochastic shortest path (SSP) is a well-known problem in planning and c...
research
05/25/2022

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

We initiate the study of dynamic regret minimization for goal-oriented r...
research
06/09/2021

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

We consider the problem of online reinforcement learning for the Stochas...
research
02/07/2022

Policy Optimization for Stochastic Shortest Path

Policy optimization is among the most popular and successful reinforceme...
research
10/10/2022

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

We study the sample complexity of learning an ϵ-optimal policy in the St...
research
06/10/2022

Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality

Goal-oriented Reinforcement Learning, where the agent needs to reach the...

Please sign up or login with your details

Forgot password? Click here to reset