Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent

12/30/2022
by   Xi Chen, et al.
0

With the fast development of big data, it has been easier than before to learn the optimal decision rule by updating the decision rule recursively and making online decisions. We study the online statistical inference of model parameters in a contextual bandit framework of sequential decision-making. We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted stochastic gradient descent. We allow different weighting schemes of the stochastic gradient and establish the asymptotic normality of the parameter estimator. Our proposed estimator significantly improves the asymptotic efficiency over the previous averaged SGD approach via inverse probability weights. We also conduct an optimality analysis on the weights in a linear regression setting. We provide a Bahadur representation of the proposed estimator and show that the remainder term in the Bahadur representation entails a slower convergence rate compared to classical SGD due to the adaptive data collection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2020

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Online decision making aims to learn the optimal decision rule by making...
research
12/21/2022

Online Statistical Inference for Matrix Contextual Bandit

Contextual bandit has been widely used for sequential decision-making ba...
research
10/14/2020

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Online decision-making problem requires us to make a sequence of decisio...
research
07/14/2023

Adaptive Linear Estimating Equations

Sequential data collection has emerged as a widely adopted technique for...
research
03/21/2023

Adaptive Experimentation at Scale: Bayesian Algorithms for Flexible Batches

Standard bandit algorithms that assume continual reallocation of measure...
research
06/07/2020

An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling

We consider the contextual bandit problem, where a player sequentially m...
research
06/12/2018

INFERNO: Inference-Aware Neural Optimisation

Complex computer simulations are commonly required for accurate data mod...

Please sign up or login with your details

Forgot password? Click here to reset