Online Learning for Stochastic Shortest Path Model via Posterior Sampling

06/09/2021
by   Mehdi Jafarnia-Jahromi, et al.
0

We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state. We propose PSRL-SSP, a simple posterior sampling-based reinforcement learning algorithm for the SSP problem. The algorithm operates in epochs. At the beginning of each epoch, a sample is drawn from the posterior distribution on the unknown model dynamics, and the optimal policy with respect to the drawn sample is followed during that epoch. An epoch completes if either the number of visits to the goal state in the current epoch exceeds that of the previous epoch, or the number of visits to any of the state-action pairs is doubled. We establish a Bayesian regret bound of O(B_⋆ S√(AK)), where B_⋆ is an upper bound on the expected cost of the optimal policy, S is the size of the state space, A is the size of the action space, and K is the number of episodes. The algorithm only requires the knowledge of the prior distribution, and has no hyper-parameters to tune. It is the first such posterior sampling algorithm and outperforms numerically previously proposed optimism-based algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

Minimax Regret for Stochastic Shortest Path

We study the Stochastic Shortest Path (SSP) problem in which an agent ha...
research
02/23/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Stochastic shortest path (SSP) is a well-known problem in planning and c...
research
07/13/2021

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

We propose the k-Shortest-Path (k-SP) constraint: a novel constraint on ...
research
03/22/2022

An Online Learning Approach to Shortest Path and Backpressure Routing in Wireless Networks

We consider the adaptive routing problem in multihop wireless networks. ...
research
06/10/2022

Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality

Goal-oriented Reinforcement Learning, where the agent needs to reach the...
research
05/25/2022

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

We initiate the study of dynamic regret minimization for goal-oriented r...
research
10/19/2019

Opinion shaping in social networks using reinforcement learning

In this paper, we study how to shape opinions in social networks when th...

Please sign up or login with your details

Forgot password? Click here to reset