Bayesian decision-making under misspecified priors with applications to meta-learning

by   Max Simchowitz, et al.

Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecification. We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most 𝒪̃(H^2 ϵ) from TS with a well specified prior, where ϵ is the total-variation distance between priors and H is the learning horizon. Our bound does not require the prior to have any parametric form. For priors with bounded support, our bound is independent of the cardinality or structure of the action space, and we show that it is tight up to universal constants in the worst case. Building on our sensitivity analysis, we establish generic PAC guarantees for algorithms in the recently studied Bayesian meta-learning setting and derive corollaries for various families of priors. Our results generalize along two axes: (1) they apply to a broader family of Bayesian decision-making algorithms, including a Monte-Carlo implementation of the knowledge gradient algorithm (KG), and (2) they apply to Bayesian POMDPs, the most general Bayesian decision-making setting, encompassing contextual bandits as a special case. Through numerical simulations, we illustrate how prior misspecification and the deployment of one-step look-ahead (as in KG) can impact the convergence of meta-learning in multi-armed and contextual bandits with structured and correlated priors.


page 1

page 2

page 3

page 4

∙ 07/12/2021

Metalearning Linear Bandits by Prior Update

Fully Bayesian approaches to sequential decision-making assume that prob...
∙ 03/28/2019

Meta-Learning surrogate models for sequential decision making

Meta-learning methods leverage past experience to learn data-driven indu...
∙ 02/11/2021

Meta-Thompson Sampling

Efficient exploration in multi-armed bandits is a fundamental online lea...
∙ 02/16/2021

The Randomized Elliptical Potential Lemma with an Application to Linear Thompson Sampling

In this note, we introduce a randomized version of the well-known ellipt...
∙ 12/30/2017

Learning Structural Weight Uncertainty for Sequential Decision-Making

Learning probability distributions on the weights of neural networks (NN...
∙ 07/09/2022

Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

Neural Processes (NPs) are a popular class of approaches for meta-learni...
∙ 02/01/2022

Meta-Learning Hypothesis Spaces for Sequential Decision-making

Obtaining reliable, adaptive confidence sets for prediction functions (h...