Online Statistical Inference for Matrix Contextual Bandit

12/21/2022
by   Qiyu Han, et al.
0

Contextual bandit has been widely used for sequential decision-making based on the current contextual information and historical feedback data. In modern applications, such context format can be rich and can often be formulated as a matrix. Moreover, while existing bandit algorithms mainly focused on reward-maximization, less attention has been paid to the statistical inference. To fill in these gaps, in this work we consider a matrix contextual bandit framework where the true model parameter is a low-rank matrix, and propose a fully online procedure to simultaneously make sequential decision-making and conduct statistical inference. The low-rank structure of the model parameter and the adaptivity nature of the data collection process makes this difficult: standard low-rank estimators are not fully online and are biased, while existing inference approaches in bandit algorithms fail to account for the low-rankness and are also biased. To address these, we introduce a new online doubly-debiasing inference procedure to simultaneously handle both sources of bias. In theory, we establish the asymptotic normality of the proposed online doubly-debiased estimator and prove the validity of the constructed confidence interval. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its non-asymptotic convergence result, which is also of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2022

Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent

With the fast development of big data, it has been easier than before to...
research
10/14/2020

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Online decision-making problem requires us to make a sequence of decisio...
research
10/14/2020

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Online decision making aims to learn the optimal decision rule by making...
research
03/21/2023

Adaptive Experimentation at Scale: Bayesian Algorithms for Flexible Batches

Standard bandit algorithms that assume continual reallocation of measure...
research
04/29/2021

Statistical Inference with M-Estimators on Bandit Data

Bandit algorithms are increasingly used in real world sequential decisio...
research
05/31/2023

Low-rank extended Kalman filtering for online learning of neural networks from streaming data

We propose an efficient online approximate Bayesian inference algorithm ...
research
02/14/2022

Statistical Inference After Adaptive Sampling in Non-Markovian Environments

There is a great desire to use adaptive sampling methods, such as reinfo...

Please sign up or login with your details

Forgot password? Click here to reset