Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

05/04/2021
by   Daniel Vial, et al.
0

We propose two algorithms for episodic stochastic shortest path problems with linear function approximation. The first is computationally expensive but provably obtains Õ (√(B_⋆^3 d^3 K/c_min) ) regret, where B_⋆ is a (known) upper bound on the optimal cost-to-go function, d is the feature dimension, K is the number of episodes, and c_min is the minimal cost of non-goal state-action pairs (assumed to be positive). The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound. Both algorithms are based on an optimistic least-squares version of value iteration analogous to the finite-horizon backward induction approach from Jin et al. 2020. To the best of our knowledge, these are the first regret bounds for stochastic shortest path that are independent of the size of the state and action spaces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2021

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

We introduce two new no-regret algorithms for the stochastic shortest pa...
research
10/25/2021

Learning Stochastic Shortest Path with Linear Function Approximation

We study the stochastic shortest path (SSP) problem in reinforcement lea...
research
06/15/2021

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

We introduce a generic template for developing regret minimization algor...
research
10/24/2018

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

We introduce efficient algorithms which achieve nearly optimal regrets f...
research
07/31/2022

Convex duality for stochastic shortest path problems in known and unknown environments

This paper studies Stochastic Shortest Path (SSP) problems in known and ...
research
10/17/2022

A Unified Algorithm for Stochastic Path Problems

We study reinforcement learning in stochastic path (SP) problems. The go...
research
07/11/2012

Discretized Approximations for POMDP with Average Cost

In this paper, we propose a new lower approximation scheme for POMDP wit...

Please sign up or login with your details

Forgot password? Click here to reset