DeepAI

# Thompson Sampling is Asymptotically Optimal in General Environments

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

• 26 publications
• 55 publications
• 19 publications
• 79 publications
11/28/2016

### Nonparametric General Reinforcement Learning

Reinforcement learning (RL) problems are often phrased in terms of Marko...
09/29/2012

### Optimistic Agents are Asymptotically Optimal

We use optimism to introduce generic asymptotically optimal reinforcemen...
11/02/2012

### Learning classifier systems with memory condition to solve non-Markov problems

In the family of Learning Classifier Systems, the classifier system XCS ...
03/04/2019

### Strong Asymptotic Optimality in General Environments

Reinforcement Learning agents are expected to eventually perform well. T...
04/17/2002

### Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

The problem of making sequential decisions in unknown probabilistic envi...
02/23/2018

### On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

We study the non-stationary stochastic multiarmed bandit (MAB) problem a...
12/28/2019

### Value of structural health monitoring quantification in partially observable stochastic environments

Sequential decision-making under uncertainty for optimal life-cycle cont...