Statistical Inference for Online Decision Making via Stochastic Gradient Descent

10/14/2020
by   Haoyu Chen, et al.
0

Online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. Since the decision rule should be updated once per step, an offline update which uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Focusing on the statistical inference of online decision making, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The proposed algorithm and theoretical results are tested by simulations and a real data application to news article recommendation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2022

Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent

With the fast development of big data, it has been easier than before to...
research
10/14/2020

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Online decision-making problem requires us to make a sequence of decisio...
research
01/20/2022

Statistical Learning for Individualized Asset Allocation

We establish a high-dimensional statistical learning framework for indiv...
research
05/12/2019

Note on Thompson sampling for large decision problems

There is increasing interest in using streaming data to inform decision ...
research
12/21/2022

Online Statistical Inference for Matrix Contextual Bandit

Contextual bandit has been widely used for sequential decision-making ba...
research
04/21/2021

GEAR: On Optimal Decision Making with Auxiliary Data

Personalized optimal decision making, finding the optimal decision rule ...
research
09/13/2017

Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks

We consider the problem of sequentially making decisions that are reward...

Please sign up or login with your details

Forgot password? Click here to reset