Metalearning Linear Bandits by Prior Update

07/12/2021
by   Amit Peleg, et al.
0

Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior, while in practice, such information is often lacking, and needs to be estimated through learning. This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance. In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of the algorithm that uses the true prior. Next, we address the task of learning the prior through metalearning, where a learner updates its estimate of the prior across multiple task instances in order to improve performance on future tasks. The estimated prior is then updated within each task based on incoming observations, while actions are selected in order to maximize expected reward. In this work we apply this scheme within a linear bandit setting, and provide algorithms and regret bounds, demonstrating its effectiveness, as compared to an algorithm that knows the correct prior. Our results hold for a broad class of algorithms, including, for example, Thompson Sampling and Information Directed Sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2021

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimens...
research
07/03/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Thompson sampling and other Bayesian sequential decision-making algorith...
research
12/10/2021

Encoding priors in the brain: a reinforcement learning model for mouse decision making

In two-alternative forced choice tasks, prior knowledge can improve perf...
research
02/16/2021

The Randomized Elliptical Potential Lemma with an Application to Linear Thompson Sampling

In this note, we introduce a randomized version of the well-known ellipt...
research
07/05/2015

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

We consider the correlated multiarmed bandit (MAB) problem in which the ...
research
02/07/2023

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Partial monitoring is an expressive framework for sequential decision-ma...
research
10/08/2022

Empirical Bayesian Selection for Value Maximization

We study the common problem of selecting the best m units from a set of ...

Please sign up or login with your details

Forgot password? Click here to reset