Contextual Bandits Evolving Over Finite Time

11/14/2019
by   Harsh Deshpande, et al.
0

Contextual bandits have the same exploration-exploitation trade-off as standard multi-armed bandits. On adding positive externalities that decay with time, this problem becomes much more difficult as wrong decisions at the start are hard to recover from. We explore existing policies in this setting and highlight their biases towards the inherent reward matrix. We propose a rejection based policy that achieves a low regret irrespective of the structure of the reward probability matrix.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2023

Neural Exploitation and Exploration of Contextual Bandits

In this paper, we study utilizing neural networks for the exploitation a...
research
11/26/2019

Contextual Combinatorial Conservative Bandits

The problem of multi-armed bandits (MAB) asks to make sequential decisio...
research
01/29/2023

Contextual Causal Bayesian Optimisation

Causal Bayesian optimisation (CaBO) combines causality with Bayesian opt...
research
05/26/2020

To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation

Delayed rewards problem in contextual bandits has been of interest in va...
research
05/17/2022

Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization

Non-stationarity is ubiquitous in human behavior and addressing it in th...
research
09/07/2019

AutoML for Contextual Bandits

Contextual Bandits is one of the widely popular techniques used in appli...
research
07/13/2021

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Understanding an agent's priorities by observing their behavior is criti...

Please sign up or login with your details

Forgot password? Click here to reset