Model-Based Bayesian Exploration

01/23/2013
by   Richard Dearden, et al.
0

Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information - the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways of representing and reasoning about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

page 7

page 8

page 9

research
06/06/2013

Direct Uncertainty Estimation in Reinforcement Learning

Optimal probabilistic approach in reinforcement learning is computationa...
research
09/12/2022

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

We propose Deterministic Sequencing of Exploration and Exploitation (DSE...
research
09/15/2017

The Uncertainty Bellman Equation and Exploration

We consider the exploration/exploitation problem in reinforcement learni...
research
09/19/2013

Exploration and Exploitation in Visuomotor Prediction of Autonomous Agents

This paper discusses various techniques to let an agent learn how to pre...
research
11/29/2017

Efficient exploration with Double Uncertain Value Networks

This paper studies directed exploration for reinforcement learning agent...
research
12/16/2018

An Active Information Seeking Model for Goal-oriented Vision-and-Language Tasks

As Computer Vision algorithms move from passive analysis of pixels to ac...
research
02/11/2020

Static and Dynamic Values of Computation in MCTS

Monte-Carlo Tree Search (MCTS) is one of the most-widely used methods fo...

Please sign up or login with your details

Forgot password? Click here to reset