Learning Stochastic Shortest Path with Linear Function Approximation

10/25/2021
by   Yifei Min, et al.
11

We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems the linear mixture SSP. We propose a novel algorithm for learning the linear mixture SSP, which can attain a Õ(d B_⋆^1.5√(K/c_min)) regret. Here K is the number of episodes, d is the dimension of the feature mapping in the mixture model, B_⋆ bounds the expected cumulative cost of the optimal policy, and c_min>0 is the lower bound of the cost function. Our algorithm also applies to the case when c_min = 0, where a Õ(K^2/3) regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. In complement to the regret upper bounds, we also prove a lower bound of Ω(d B_⋆√(K)), which nearly matches our upper bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

We propose two algorithms for episodic stochastic shortest path problems...
research
10/17/2022

A Unified Algorithm for Stochastic Path Problems

We study reinforcement learning in stochastic path (SP) problems. The go...
research
02/11/2022

Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics

The theory of reinforcement learning currently suffers from a mismatch b...
research
07/11/2012

Discretized Approximations for POMDP with Average Cost

In this paper, we propose a new lower approximation scheme for POMDP wit...
research
02/23/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Stochastic shortest path (SSP) is a well-known problem in planning and c...
research
04/04/2023

Online Joint Assortment-Inventory Optimization under MNL Choices

We study an online joint assortment-inventory optimization problem, in w...
research
10/24/2018

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

We introduce efficient algorithms which achieve nearly optimal regrets f...

Please sign up or login with your details

Forgot password? Click here to reset