Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model

01/31/2019
by   Gi-Soo Kim, et al.
0

Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative rewards in sequential decision tasks such as news article recommendation systems, web page ad placement algorithms, and mobile health. However, most of the proposed contextual MAB algorithms assume linear relationships between the reward and the context of the action. This paper proposes a new contextual MAB algorithm for a relaxed, semiparametric reward model that supports nonstationarity. The proposed method is less restrictive, easier to implement and faster than two alternative algorithms that consider the same model, while achieving a tight regret upper bound. We prove that the high-probability upper bound of the regret incurred by the proposed algorithm has the same order as the Thompson sampling algorithm for linear reward models. The proposed and existing algorithms are evaluated via simulation and also applied to Yahoo! news article recommendation log data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2019

Doubly-Robust Lasso Bandit

Contextual multi-armed bandit algorithms are widely used in sequential d...
research
01/20/2023

GBOSE: Generalized Bandit Orthogonalized Semiparametric Estimation

In sequential decision-making scenarios i.e., mobile health recommendati...
research
05/04/2019

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Linear contextual bandit is a class of sequential decision making proble...
research
02/10/2022

Remote Contextual Bandits

We consider a remote contextual multi-armed bandit (CMAB) problem, in wh...
research
10/15/2018

Regret vs. Bandwidth Trade-off for Recommendation Systems

We consider recommendation systems that need to operate under wireless b...
research
05/04/2018

Beyond the Click-Through Rate: Web Link Selection with Multi-level Feedback

The web link selection problem is to select a small subset of web links ...
research
06/18/2020

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

We study the effect of persistence of engagement on learning in a stocha...

Please sign up or login with your details

Forgot password? Click here to reset