Meta-Learning for Contextual Bandit Exploration

01/23/2019
by   Amr Sharaf, et al.
18

We describe MELEE, a meta-learning algorithm for learning a good exploration policy in the interactive contextual bandit setting. Here, an algorithm must take actions based on contexts, and learn based only on a reward signal from the action taken, thereby generating an exploration/exploitation trade-off. MELEE addresses this trade-off by learning a good exploration strategy for offline tasks based on synthetic data, on which it can simulate the contextual bandit setting. Based on these simulations, MELEE uses an imitation learning strategy to learn a good exploration policy that can then be applied to true contextual bandit tasks at test time. We compare MELEE to seven strong baseline contextual bandit algorithms on a set of three hundred real-world datasets, on which it outperforms alternatives in most settings, especially when differences in rewards are large. Finally, we demonstrate the importance of having a rich feature representation for learning how to explore.

READ FULL TEXT
research
05/04/2020

Hyper-parameter Tuning for the Contextual Bandit

We study here the problem of learning the exploration exploitation trade...
research
04/28/2017

Exploiting the Natural Exploration In Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms...
research
02/02/2020

Safe Exploration for Optimizing Contextual Bandits

Contextual bandit problems are a natural fit for many information retrie...
research
02/27/2010

Learning from Logged Implicit Exploration Data

We provide a sound and consistent foundation for the use of nonrandom ex...
research
02/17/2020

Differentiable Bandit Exploration

We learn bandit policies that maximize the average reward over bandit in...
research
10/06/2021

Residual Overfit Method of Exploration

Exploration is a crucial aspect of bandit and reinforcement learning alg...
research
02/03/2018

Adaptive Representation Selection in Contextual Bandit with Unlabeled History

We consider an extension of the contextual bandit setting, motivated by ...

Please sign up or login with your details

Forgot password? Click here to reset