DeepAI AI Chat
Log In Sign Up

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

by   Daniel Vial, et al.

We propose two algorithms for episodic stochastic shortest path problems with linear function approximation. The first is computationally expensive but provably obtains Õ (√(B_⋆^3 d^3 K/c_min) ) regret, where B_⋆ is a (known) upper bound on the optimal cost-to-go function, d is the feature dimension, K is the number of episodes, and c_min is the minimal cost of non-goal state-action pairs (assumed to be positive). The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound. Both algorithms are based on an optimistic least-squares version of value iteration analogous to the finite-horizon backward induction approach from Jin et al. 2020. To the best of our knowledge, these are the first regret bounds for stochastic shortest path that are independent of the size of the state and action spaces.


page 1

page 2

page 3

page 4


Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

We introduce two new no-regret algorithms for the stochastic shortest pa...

Learning Stochastic Shortest Path with Linear Function Approximation

We study the stochastic shortest path (SSP) problem in reinforcement lea...

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

We introduce a generic template for developing regret minimization algor...

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

We introduce efficient algorithms which achieve nearly optimal regrets f...

Convex duality for stochastic shortest path problems in known and unknown environments

This paper studies Stochastic Shortest Path (SSP) problems in known and ...

A Unified Algorithm for Stochastic Path Problems

We study reinforcement learning in stochastic path (SP) problems. The go...

Discretized Approximations for POMDP with Average Cost

In this paper, we propose a new lower approximation scheme for POMDP wit...