Thompson Sampling is Asymptotically Optimal in General Environments

02/25/2016
by   Jan Leike, et al.
0

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2016

Nonparametric General Reinforcement Learning

Reinforcement learning (RL) problems are often phrased in terms of Marko...
research
09/29/2012

Optimistic Agents are Asymptotically Optimal

We use optimism to introduce generic asymptotically optimal reinforcemen...
research
11/02/2012

Learning classifier systems with memory condition to solve non-Markov problems

In the family of Learning Classifier Systems, the classifier system XCS ...
research
03/04/2019

Strong Asymptotic Optimality in General Environments

Reinforcement Learning agents are expected to eventually perform well. T...
research
04/17/2002

Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

The problem of making sequential decisions in unknown probabilistic envi...
research
02/23/2018

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

We study the non-stationary stochastic multiarmed bandit (MAB) problem a...
research
12/28/2019

Value of structural health monitoring quantification in partially observable stochastic environments

Sequential decision-making under uncertainty for optimal life-cycle cont...

Please sign up or login with your details

Forgot password? Click here to reset