Sequential Batch Learning in Finite-Action Linear Contextual Bandits

04/14/2020
by   Yanjun Han, et al.
9

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe outcomes for the individuals within a batch at the batch's end. Compared to both standard online contextual bandits learning or offline policy learning in contexutal bandits, this sequential batch learning problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications, including medical treatment in clinical trials, product recommendation in e-commerce and adaptive experiment design in crowdsourcing. We study two settings of the problem: one where the contexts are arbitrarily generated and the other where the contexts are iid drawn from some distribution. In each setting, we establish a regret lower bound and provide an algorithm, whose regret upper bound nearly matches the lower bound. As an important insight revealed therefrom, in the former setting, we show that the number of batches required to achieve the fully online performance is polynomial in the time horizon, while for the latter setting, a pure-exploitation algorithm with a judicious batch partition scheme achieves the fully online performance even when the number of batches is less than logarithmic in the time horizon. Together, our results provide a near-complete characterization of sequential decision making in linear contextual bandits when batch constraints are present.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2020

Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

We study the problem of dynamic batch learning in high-dimensional spars...
research
02/25/2021

Batched Neural Bandits

In many sequential decision-making problems, the individuals are split i...
research
05/21/2021

Parallelizing Contextual Linear Bandits

Standard approaches to decision-making under uncertainty focus on sequen...
research
06/02/2021

Parallelizing Thompson Sampling

How can we make use of information parallelism in online decision making...
research
07/13/2021

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Understanding an agent's priorities by observing their behavior is criti...
research
07/01/2021

Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow

In membership/subscriber acquisition and retention, we sometimes need to...
research
03/22/2020

Optimal No-regret Learning in Repeated First-price Auctions

We study online learning in repeated first-price auctions with censored ...

Please sign up or login with your details

Forgot password? Click here to reset