Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design

07/04/2020
by   Yufei Ruan, et al.
0

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online active learning. We consider two popular limited adaptivity models in literature: batch learning and rare policy switches. We show that, when the context vectors are adversarially chosen in d-dimensional linear contextual bandits, the learner needs O(d log d log T) policy switches to achieve the minimax-optimal regret, and this is optimal up to poly(log d, loglog T) factors; for stochastic context vectors, even in the more restricted batch learning model, only O(loglog T) batches are needed to achieve the optimal regret. Together with the known results in literature, our results present a complete picture about the adaptivity constraints in linear contextual bandits. Along the way, we propose the distributional optimal design, a natural extension of the optimal experiment design, and provide a both statistically and computationally efficient learning algorithm for the problem, which may be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2020

Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

We study the problem of dynamic batch learning in high-dimensional spars...
research
10/15/2021

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

We study the optimal batch-regret tradeoff for batch linear contextual b...
research
11/03/2021

The Impact of Batch Learning in Stochastic Bandits

We consider a special case of bandit problems, namely batched bandits. M...
research
06/09/2021

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

We consider the following variant of contextual linear bandits motivated...
research
10/08/2020

Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination

In this work we revisit two classic high-dimensional online learning pro...
research
06/10/2020

Distributional Robust Batch Contextual Bandits

Policy learning using historical observational data is an important prob...
research
05/21/2022

Pessimism for Offline Linear Contextual Bandits using ℓ_p Confidence Sets

We present a family {π̂}_p≥ 1 of pessimistic learning rules for offline ...

Please sign up or login with your details

Forgot password? Click here to reset