Parameterized Exploration

07/13/2019
by   Jesse Clifton, et al.
0

We introduce Parameterized Exploration (PE), a simple family of methods for model-based tuning of the exploration schedule in sequential decision problems. Unlike common heuristics for exploration, our method accounts for the time horizon of the decision problem as well as the agent's current state of knowledge of the dynamics of the decision problem. We show our method as applied to several common exploration techniques has superior performance relative to un-tuned counterparts in Bernoulli and Gaussian multi-armed bandits, contextual bandits, and a Markov decision process based on a mobile health (mHealth) study. We also examine the effects of the accuracy of the estimated dynamics model on the performance of PE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2018

Deep Contextual Multi-armed Bandits

Contextual multi-armed bandit problems arise frequently in important ind...
research
10/14/2020

Asymptotic Randomised Control with applications to bandits

We consider a general multi-armed bandit problem with correlated (and si...
research
08/23/2018

Diversity-Driven Selection of Exploration Strategies in Multi-Armed Bandits

We consider a scenario where an agent has multiple available strategies ...
research
02/19/2021

Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs

We present a new type of acquisition functions for online decision makin...
research
09/16/2016

Exploration Potential

We introduce exploration potential, a quantity that measures how much a ...
research
07/01/2021

Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow

In membership/subscriber acquisition and retention, we sometimes need to...
research
07/19/2021

An Analysis of Reinforcement Learning for Malaria Control

Previous work on policy learning for Malaria control has often formulate...

Please sign up or login with your details

Forgot password? Click here to reset