Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case

02/10/2021
by   Liyu Chen, et al.
0

We make significant progress toward the stochastic shortest path problem with adversarial costs and unknown transition. Specifically, we develop algorithms that achieve O(√(S^2ADT_⋆ K)) regret for the full-information setting and O(√(S^3A^2DT_⋆ K)) regret for the bandit feedback setting, where D is the diameter, T_⋆ is the expected hitting time of the optimal policy, S is the number of states, A is the number of actions, and K is the number of episodes. Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition. To remedy the gap between our upper bounds and the current best lower bounds constructed via a stochastically oblivious adversary, we also propose algorithms with near-optimal regret for this special case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

We study the stochastic shortest path problem with adversarial costs and...
research
06/20/2020

Adversarial Stochastic Shortest Path

Stochastic shortest path (SSP) is a well-known problem in planning and c...
research
10/24/2018

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

We introduce efficient algorithms which achieve nearly optimal regrets f...
research
07/31/2022

Convex duality for stochastic shortest path problems in known and unknown environments

This paper studies Stochastic Shortest Path (SSP) problems in known and ...
research
06/08/2021

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

We consider the best-of-both-worlds problem for learning an episodic Mar...
research
02/14/2012

Suboptimality Bounds for Stochastic Shortest Path Problems

We consider how to use the Bellman residual of the dynamic programming o...
research
02/07/2022

Policy Optimization for Stochastic Shortest Path

Policy optimization is among the most popular and successful reinforceme...

Please sign up or login with your details

Forgot password? Click here to reset