Design of Experiments for Stochastic Contextual Linear Bandits

07/21/2021
by   Andrea Zanette, et al.
0

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Exploring with a single non-reactive policy is beneficial in such cases. Assuming some batch contexts are available, we design a single stochastic policy to collect a good dataset from which a near-optimal policy can be extracted. We present a theoretical analysis as well as numerical experiments on both synthetic and real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2023

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

In some applications of reinforcement learning, a dataset of pre-collect...
research
05/30/2019

Rarely-switching linear bandits: optimization of causal effects for the real world

Exploring the effect of policies in many real world scenarios is difficu...
research
02/27/2022

Federated Online Sparse Decision Making

This paper presents a novel federated linear contextual bandits model, w...
research
12/12/2019

Sublinear Optimal Policy Value Estimation in Contextual Bandits

We study the problem of estimating the expected reward of the optimal po...
research
10/27/2021

Federated Linear Contextual Bandits

This paper presents a novel federated linear contextual bandits model, w...
research
02/25/2021

Batched Neural Bandits

In many sequential decision-making problems, the individuals are split i...
research
10/20/2022

Vertical Federated Linear Contextual Bandits

In this paper, we investigate a novel problem of building contextual ban...

Please sign up or login with your details

Forgot password? Click here to reset