Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

09/30/2022
by   Siddhartha Banerjee, et al.
18

While standard bandit algorithms sometimes incur high regret, their performance can be greatly improved by "warm starting" with historical data. Unfortunately, how best to incorporate historical data is unclear: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to computational and storage issues - particularly in continuous action spaces. We address these two challenges by proposing Artificial Replay, a meta-algorithm for incorporating historical data into any arbitrary base bandit algorithm. Artificial Replay uses only a subset of the historical data as needed to reduce computation and storage. We show that for a broad class of base algorithms that satisfy independence of irrelevant data (IIData), a novel property that we introduce, our method achieves equal regret as a full warm-start approach while potentially using only a fraction of the historical data. We complement these theoretical results with a case study of K-armed and continuous combinatorial bandit algorithms, including on a green security domain using real poaching data, to show the practical benefits of Artificial Replay in achieving optimal regret alongside low computational and storage costs.

READ FULL TEXT
research
12/24/2020

Upper Confidence Bounds for Combining Stochastic Bandits

We provide a simple method to combine stochastic bandit algorithms. Our ...
research
06/16/2020

Corralling Stochastic Bandit Algorithms

We study the problem of corralling stochastic bandit algorithms, that is...
research
05/11/2022

Ranked Prioritization of Groups in Combinatorial Bandit Allocation

Preventing poaching through ranger patrols protects endangered wildlife,...
research
05/12/2023

High Accuracy and Low Regret for User-Cold-Start Using Latent Bandits

We develop a novel latent-bandit algorithm for tackling the cold-start p...
research
02/12/2022

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

We consider the problem of combining and learning over a set of adversar...
research
02/09/2016

Compliance-Aware Bandits

Motivated by clinical trials, we study bandits with observable non-compl...

Please sign up or login with your details

Forgot password? Click here to reset