The Impact of Batch Learning in Stochastic Linear Bandits

02/14/2022
by   Danil Provodin, et al.
0

We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a practically relevant batch-centric scenario of batch learning. That is to say, we provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy. Our main theoretical results show that the impact of batch learning can be measured proportional to the regret of online behavior. Primarily, we study two settings of the problem: instance-independent and instance-dependent. While the upper bound is the same for both settings, the worst-case lower bound is more comprehensive in the former case and more accurate in the latter one. Also, we provide a more robust result for the 2-armed bandit problem as an important insight. Finally, we demonstrate the consistency of theoretical results by conducting empirical experiments and reflect on the optimal batch size choice.

READ FULL TEXT
research
11/03/2021

The Impact of Batch Learning in Stochastic Bandits

We consider a special case of bandit problems, namely batched bandits. M...
research
08/27/2020

Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

We study the problem of dynamic batch learning in high-dimensional spars...
research
06/25/2019

Restless dependent bandits with fading memory

We study the stochastic multi-armed bandit problem in the case when the ...
research
07/09/2018

Dynamic Pricing with Finitely Many Unknown Valuations

Motivated by posted price auctions where buyers are grouped in an unknow...
research
04/10/2023

Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk

We study the trade-off between expectation and tail risk for regret dist...
research
08/11/2022

Regret Analysis for Hierarchical Experts Bandit Problem

We study an extension of standard bandit problem in which there are R la...
research
09/16/2021

Policy Choice and Best Arm Identification: Comments on "Adaptive Treatment Assignment in Experiments for Policy Choice"

Adaptive experimental design for efficient decision-making is an importa...

Please sign up or login with your details

Forgot password? Click here to reset