Exploration Potential

09/16/2016
by   Jan Leike, et al.
0

We introduce exploration potential, a quantity that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem's reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class). Our experiments in multi-armed bandits use exploration potential to illustrate how different algorithms make the tradeoff between exploration and exploitation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2018

Diversity-Driven Selection of Exploration Strategies in Multi-Armed Bandits

We consider a scenario where an agent has multiple available strategies ...
research
10/08/2017

Using the Value of Information to Explore Stochastic, Discrete Multi-Armed Bandits

In this paper, we propose an information-theoretic exploration strategy ...
research
11/13/2020

Active Reinforcement Learning: Observing Rewards at a Cost

Active reinforcement learning (ARL) is a variant on reinforcement learni...
research
09/17/2021

Knowledge is reward: Learning optimal exploration by predictive reward cashing

There is a strong link between the general concept of intelligence and t...
research
12/15/2022

Ungeneralizable Contextual Logistic Bandit in Credit Scoring

The application of reinforcement learning in credit scoring has created ...
research
07/13/2019

Parameterized Exploration

We introduce Parameterized Exploration (PE), a simple family of methods ...
research
05/26/2020

To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation

Delayed rewards problem in contextual bandits has been of interest in va...

Please sign up or login with your details

Forgot password? Click here to reset