Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

10/14/2020
by   Haoyu Chen, et al.
0

Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The ε-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normal. We illustrate our results using simulations and an application to a news article recommendation dataset from Yahoo!.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2022

Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent

With the fast development of big data, it has been easier than before to...
research
12/21/2022

Online Statistical Inference for Matrix Contextual Bandit

Contextual bandit has been widely used for sequential decision-making ba...
research
10/14/2020

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Online decision making aims to learn the optimal decision rule by making...
research
04/29/2021

Statistical Inference with M-Estimators on Bandit Data

Bandit algorithms are increasingly used in real world sequential decisio...
research
09/13/2017

Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks

We consider the problem of sequentially making decisions that are reward...
research
06/02/2021

Parallelizing Thompson Sampling

How can we make use of information parallelism in online decision making...

Please sign up or login with your details

Forgot password? Click here to reset