
Statistical Inference for Online Decision Making via Stochastic Gradient Descent
Online decision making aims to learn the optimal decision rule by making...
read it

Online Learning and DecisionMaking under Generalized Linear Model with HighDimensional Data
We propose a minimax concave penalized multiarmed bandit algorithm unde...
read it

Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks
We consider the problem of sequentially making decisions that are reward...
read it

Inference for Batched Bandits
As bandit algorithms are increasingly utilized in scientific studies, th...
read it

An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling
We consider the contextual bandit problem, where a player sequentially m...
read it

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem
Bandit learning is characterized by the tension between longterm explor...
read it

Inference Without Compatibility
We consider hypothesis testing problems for a single covariate in the co...
read it
Statistical Inference for Online DecisionMaking: In a Contextual Bandit Setting
Online decisionmaking problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the longterm reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The εgreedy policy is adopted to address the classic explorationandexploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the insample inverse propensity weighted value estimator is asymptotically normal. We illustrate our results using simulations and an application to a news article recommendation dataset from Yahoo!.
READ FULL TEXT
Comments
There are no comments yet.