DeepAI
Log In Sign Up

No-Regret Exploration in Goal-Oriented Reinforcement Learning

12/07/2019
by   Jean Tarbouriech, et al.
0

Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the so-called episodic setting or stochastic shortest path (SSP) problem, where an agent has to achieve a predefined goal state (e.g., the top of the hill) while maximizing the cumulative reward or minimizing the cumulative cost. Despite its popularity, most of the literature studying the exploration-exploitation dilemma either focused on different problems (i.e., fixed-horizon and infinite-horizon) or made the restrictive loop-free assumption (which implies that no same state can be visited twice during any episode). In this paper, we study the general SSP setting and introduce the algorithm UC-SSP whose regret scales as O(c_max^3/2 c_min^-1/2 D S √( A D K)) after K episodes for any unknown SSP with S non-terminal states, A actions, an SSP-diameter of D and positive costs in [c_min, c_max]. UC-SSP is thus the first learning algorithm with vanishing regret in the theoretically challenging setting of episodic RL.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/22/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

We study the problem of learning in the stochastic shortest path (SSP) s...
02/23/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Stochastic shortest path (SSP) is a well-known problem in planning and c...
06/07/2017

Efficient Reinforcement Learning via Initial Pure Exploration

In several realistic situations, an interactive learning agent can pract...
06/20/2020

Adversarial Stochastic Shortest Path

Stochastic shortest path (SSP) is a well-known problem in planning and c...
01/13/2023

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

We propose the first model-free algorithm that achieves low regret perfo...
07/13/2020

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

A common assumption in reinforcement learning (RL) is to have access to ...
02/07/2022

On learning Whittle index policy for restless bandits with scalable regret

Reinforcement learning is an attractive approach to learn good resource ...