Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

05/04/2021 ∙ by Daniel Vial, et al. ∙ 0

We propose two algorithms for episodic stochastic shortest path problems with linear function approximation. The first is computationally expensive but provably obtains Õ (√(B_⋆^3 d^3 K/c_min) ) regret, where B_⋆ is a (known) upper bound on the optimal cost-to-go function, d is the feature dimension, K is the number of episodes, and c_min is the minimal cost of non-goal state-action pairs (assumed to be positive). The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound. Both algorithms are based on an optimistic least-squares version of value iteration analogous to the finite-horizon backward induction approach from Jin et al. 2020. To the best of our knowledge, these are the first regret bounds for stochastic shortest path that are independent of the size of the state and action spaces.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.