Knowledge is reward: Learning optimal exploration by predictive reward cashing

09/17/2021
by   Luca Ambrogioni, et al.
12

There is a strong link between the general concept of intelligence and the ability to collect and use information. The theory of Bayes-adaptive exploration offers an attractive optimality framework for training machines to perform complex information gathering tasks. However, the computational complexity of the resulting optimal control problem has limited the diffusion of the theory to mainstream deep AI research. In this paper we exploit the inherent mathematical structure of Bayes-adaptive problems in order to dramatically simplify the problem by making the reward structure denser while simultaneously decoupling the learning of exploitation and exploration policies. The key to this simplification comes from the novel concept of cross-value (i.e. the value of being in an environment while acting optimally according to another), which we use to quantify the value of currently available information. This results in a new denser reward structure that "cashes in" all future rewards that can be predicted from the current information state. In a set of experiments we show that the approach makes it possible to learn challenging information gathering tasks without the use of shaping and heuristic bonuses in situations where the standard RL algorithms fail.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2021

Can Q-learning solve Multi Armed Bantids?

When a reinforcement learning (RL) method has to decide between several ...
research
10/18/2019

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Trading off exploration and exploitation in an unknown environment is ke...
research
09/16/2016

Exploration Potential

We introduce exploration potential, a quantity that measures how much a ...
research
11/02/2021

Discovering and Exploiting Sparse Rewards in a Learned Behavior Space

Learning optimal policies in sparse rewards settings is difficult as the...
research
10/21/2019

Exploration via Sample-Efficient Subgoal Design

The problem of exploration in unknown environments continues to pose a c...
research
03/15/2019

Adaptive Variance for Changing Sparse-Reward Environments

Robots that are trained to perform a task in a fixed environment often f...
research
06/12/2020

Recursion and evolution: Part II

We examine the question of whether it is possible for a diagonalizing sy...

Please sign up or login with your details

Forgot password? Click here to reset