Posterior Sampling for Continuing Environments

11/29/2022
by   Wanqiao Xu, et al.
0

We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments. The approach maintains a statistically plausible model of the environment and follows a policy that maximizes expected γ-discounted return in that model. At each time, with probability 1-γ, the model is replaced by a sample from the posterior distribution over environments. For a suitable schedule of γ, we establish an Õ(τ S √(A T)) bound on the Bayesian regret, where S is the number of environment states, A is the number of actions, and τ denotes the reward averaging time, which is a bound on the duration required to accurately estimate the average reward of any policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2013

(More) Efficient Reinforcement Learning via Posterior Sampling

Most provably-efficient learning algorithms introduce optimism about poo...
research
07/01/2016

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcem...
research
09/28/2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

We consider reinforcement learning in an environment modeled by an episo...
research
04/20/2022

Joint Learning of Reward Machines and Policies in Environments with Partially Known Semantics

We study the problem of reinforcement learning for a task encoded by a r...
research
02/10/2021

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

We design a simple reinforcement learning agent that, with a specificati...
research
06/02/2022

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

In this work, we propose a novel Kernelized Stein Discrepancy-based Post...
research
04/02/2018

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

In many environments only a tiny subset of all states yield high reward....

Please sign up or login with your details

Forgot password? Click here to reset