GBOSE: Generalized Bandit Orthogonalized Semiparametric Estimation

01/20/2023
by   Mubarrat Chowdhury, et al.
0

In sequential decision-making scenarios i.e., mobile health recommendation systems revenue management contextual multi-armed bandit algorithms have garnered attention for their performance. But most of the existing algorithms are built on the assumption of a strictly parametric reward model mostly linear in nature. In this work we propose a new algorithm with a semi-parametric reward model with state-of-the-art complexity of upper bound on regret amongst existing semi-parametric algorithms. Our work expands the scope of another representative algorithm of state-of-the-art complexity with a similar reward model by proposing an algorithm built upon the same action filtering procedures but provides explicit action selection distribution for scenarios involving more than two arms at a particular time step while requiring fewer computations. We derive the said complexity of the upper bound on regret and present simulation results that affirm our methods superiority out of all prevalent semi-parametric bandit algorithms for cases involving over two arms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2019

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model

Contextual multi-armed bandit (MAB) algorithms have been shown promising...
research
07/26/2019

Doubly-Robust Lasso Bandit

Contextual multi-armed bandit algorithms are widely used in sequential d...
research
05/17/2022

Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization

Non-stationarity is ubiquitous in human behavior and addressing it in th...
research
06/15/2023

Combinatorial Pure Exploration of Multi-Armed Bandit with a Real Number Action Class

The combinatorial pure exploration (CPE) in the stochastic multi-armed b...
research
06/07/2017

Bandit Models of Human Behavior: Reward Processing in Mental Disorders

Drawing an inspiration from behavioral studies of human decision making,...
research
11/05/2020

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

We study the online restless bandit problem, where the state of each arm...
research
07/26/2023

Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards

We study the piecewise stationary combinatorial semi-bandit problem with...

Please sign up or login with your details

Forgot password? Click here to reset