Delegative Reinforcement Learning: learning to avoid traps with a little help

07/19/2019
by   Vanessa Kosoy, et al.
0

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2016

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcem...
research
09/28/2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

We consider reinforcement learning in an environment modeled by an episo...
research
11/17/2018

The Impatient May Use Limited Optimism to Minimize Regret

Discounted-sum games provide a formal model for the study of reinforceme...
research
10/25/2022

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

We study the regret guarantee for risk-sensitive reinforcement learning ...
research
08/06/2018

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

We give a simple optimistic algorithm for which it is easy to derive reg...
research
11/15/2021

Delayed Feedback in Episodic Reinforcement Learning

There are many provably efficient algorithms for episodic reinforcement ...
research
05/07/2017

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observab...

Please sign up or login with your details

Forgot password? Click here to reset