
Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation
We propose two algorithms for episodic stochastic shortest path problems...
read it

Minimax Regret for Stochastic Shortest Path
We study the Stochastic Shortest Path (SSP) problem in which an agent ha...
read it

Stochastic Shortest Path: Minimax, ParameterFree and Towards HorizonFree Regret
We study the problem of learning in the stochastic shortest path (SSP) s...
read it

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
We study the stochastic shortest path problem with adversarial costs and...
read it

NoRegret Exploration in GoalOriented Reinforcement Learning
Many popular reinforcement learning problems (e.g., navigation in a maze...
read it

Learning to Route Efficiently with EndtoEnd Feedback: The Value of Networked Structure
We introduce efficient algorithms which achieve nearly optimal regrets f...
read it

Minimum Path Star Topology Algorithms for Weighted Regions and Obstacles
Shortest path algorithms have played a key role in the past century, pav...
read it
Implicit FiniteHorizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path
We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured. The key of our analysis is a new technique called implicit finitehorizon approximation, which approximates the SSP model by a finitehorizon counterpart only in the analysis without explicit implementation. Using this template, we develop two new algorithms: the first one is modelfree (the first in the literature to our knowledge) and minimax optimal under strictly positive costs; the second one is modelbased and minimax optimal even with zerocost stateaction pairs, matching the best existing result from [Tarbouriech et al., 2021b]. Importantly, both algorithms admit highly sparse updates, making them computationally more efficient than all existing algorithms. Moreover, both can be made completely parameterfree.
READ FULL TEXT
Comments
There are no comments yet.