Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

12/07/2020
by   Liyu Chen, et al.
0

We study the stochastic shortest path problem with adversarial costs and known transition, and show that the minimax regret is O(√(DT^⋆ K)) and O(√(DT^⋆ SA K)) for the full-information setting and the bandit feedback setting respectively, where D is the diameter, T^⋆ is the expected hitting time of the optimal policy, S is the number of states, A is the number of actions, and K is the number of episodes. Our results significantly improve upon the existing work of (Rosenberg and Mansour, 2020) which only considers the full-information setting and achieves suboptimal regret. Our work is also the first to consider bandit feedback with adversarial costs. Our algorithms are built on top of the Online Mirror Descent framework with a variety of new techniques that might be of independent interest, including an improved multi-scale expert algorithm, a reduction from general stochastic shortest path to a special loop-free case, a skewed occupancy measure space, novel correction term added to the cost estimators. Interestingly, the last two elements reduce the variance of the learner via positive bias and the variance of the optimal policy via negative bias respectively, and having them simultaneously is critical for obtaining the optimal high-probability bound in the bandit feedback setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case

We make significant progress toward the stochastic shortest path problem...
research
04/22/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

We study the problem of learning in the stochastic shortest path (SSP) s...
research
10/24/2018

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

We introduce efficient algorithms which achieve nearly optimal regrets f...
research
06/20/2020

Adversarial Stochastic Shortest Path

Stochastic shortest path (SSP) is a well-known problem in planning and c...
research
12/18/2021

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

We introduce two new no-regret algorithms for the stochastic shortest pa...
research
11/25/2019

Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays

We investigate the adversarial bandit problem with multiple plays under ...
research
06/15/2021

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

We introduce a generic template for developing regret minimization algor...

Please sign up or login with your details

Forgot password? Click here to reset