Online Learning and Decision-Making under Generalized Linear Model with High-Dimensional Data

12/07/2018
by   Xue Wang, et al.
0

We propose a minimax concave penalized multi-armed bandit algorithm under generalized linear model (G-MCP-Bandit) for a decision-maker facing high-dimensional data in an online learning and decision-making process. We demonstrate that the G-MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in the sample size dimension T , O(log T), and further attains a tight bound in the covariate dimension d, O(log d). In addition, we develop a linear approximation method, the 2-step weighted Lasso procedure, to identify the MCP estimator for the G-MCP-Bandit algorithm under non-iid samples. Under this procedure, the MCP estimator matches the oracle estimator with high probability and converges to the true parameters with the optimal convergence rate. Finally, through experiments based on synthetic data and two real datasets (warfarin dosing dataset and Tencent search advertising dataset), we show that the G-MCP-Bandit algorithm outperforms other benchmark algorithms, especially when there is a high level of data sparsity or the decision set is large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2020

Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection

We consider a contextual online learning (multi-armed bandit) problem wi...
research
09/22/2021

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

This paper considers two fundamental sequential decision-making problems...
research
09/23/2021

Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit

In this paper, we consider the multi-armed bandit problem with high-dime...
research
02/07/2022

Bayesian Linear Bandits for Large-Scale Recommender Systems

Potentially, taking advantage of available side information boosts the p...
research
07/01/2019

Exploiting Relevance for Online Decision-Making in High-Dimensions

Many sequential decision-making tasks require choosing at each decision ...
research
12/28/2021

Learning Across Bandits in High Dimension via Robust Statistics

Decision-makers often face the "many bandits" problem, where one must si...
research
09/17/2022

Advertising Media and Target Audience Optimization via High-dimensional Bandits

We present a data-driven algorithm that advertisers can use to automate ...

Please sign up or login with your details

Forgot password? Click here to reset