Learning Across Bandits in High Dimension via Robust Statistics

12/28/2021
by   Kan Xu, et al.
0

Decision-makers often face the "many bandits" problem, where one must simultaneously learn across related but heterogeneous contextual bandit instances. For instance, a large retailer may wish to dynamically learn product demand across many stores to solve pricing or inventory problems, making it desirable to learn jointly for stores serving similar customers; alternatively, a hospital network may wish to dynamically learn patient risk across many providers to allocate personalized interventions, making it desirable to learn jointly for hospitals serving similar patient populations. We study the setting where the unknown parameter in each bandit instance can be decomposed into a global parameter plus a sparse instance-specific term. Then, we propose a novel two-stage estimator that exploits this structure in a sample-efficient way by using a combination of robust statistics (to learn across similar instances) and LASSO regression (to debias the results). We embed this estimator within a bandit algorithm, and prove that it improves asymptotic regret bounds in the context dimension d; this improvement is exponential for data-poor instances. We further demonstrate how our results depend on the underlying network structure of bandit instances. Finally, we illustrate the value of our approach on synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2021

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

The stochastic contextual bandit problem, which models the trade-off bet...
research
06/09/2020

Differentiable Meta-Learning in Contextual Bandits

We study a contextual bandit setting where the learning agent has access...
research
04/04/2019

Empirical Bayes Regret Minimization

The prevalent approach to bandit algorithm design is to have a low-regre...
research
12/07/2018

Online Learning and Decision-Making under Generalized Linear Model with High-Dimensional Data

We propose a minimax concave penalized multi-armed bandit algorithm unde...
research
11/12/2021

Hierarchical Bayesian Bandits

Meta-, multi-task, and federated learning can be all viewed as solving s...
research
06/06/2022

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Past research on interactive decision making problems (bandits, reinforc...

Please sign up or login with your details

Forgot password? Click here to reset