Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

05/04/2021

∙

We propose two algorithms for episodic stochastic shortest path problems with linear function approximation. The first is computationally expensive but provably obtains Õ (√(B_⋆^3 d^3 K/c_min) ) regret, where B_⋆ is a (known) upper bound on the optimal cost-to-go function, d is the feature dimension, K is the number of episodes, and c_min is the minimal cost of non-goal state-action pairs (assumed to be positive). The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound. Both algorithms are based on an optimistic least-squares version of value iteration analogous to the finite-horizon backward induction approach from Jin et al. 2020. To the best of our knowledge, these are the first regret bounds for stochastic shortest path that are independent of the size of the state and action spaces.

READ FULL TEXT

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Sign in with Google

Consider DeepAI Pro