Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

07/22/2012
by   Francis Maes, et al.
0

The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy. To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution. We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed Bernoulli bandit problems and various playing budgets, show that the meta-learnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-Tuned, UCB-v, KL-UCB and epsilon greedy); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2017

Online Multi-Armed Bandit

We introduce a novel variant of the multi-armed bandit problem, in which...
research
04/25/2012

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Multi-armed bandit problems are the most basic examples of sequential de...
research
05/15/2018

Graph Signal Sampling via Reinforcement Learning

We formulate the problem of sampling and recovering clustered graph sign...
research
07/04/2023

Approximate information for efficient exploration-exploitation strategies

This paper addresses the exploration-exploitation dilemma inherent in de...
research
05/04/2018

BelMan: Bayesian Bandits on the Belief--Reward Manifold

We propose a generic, Bayesian, information geometric approach to the ex...
research
02/15/2023

Bandit Social Learning: Exploration under Myopic Behavior

We study social learning dynamics where the agents collectively follow a...
research
04/26/2021

To mock a Mocking bird : Studies in Biomimicry

This paper dwells on certain novel game-theoretic investigations in bio-...

Please sign up or login with your details

Forgot password? Click here to reset