BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

07/07/2023
by   Nicklas Werge, et al.
0

We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB's performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments.

READ FULL TEXT
research
09/19/2019

Weighted Linear Bandits for Non-Stationary Environments

We consider a stochastic linear bandit model in which the available acti...
research
01/13/2022

Non-Stationary Representation Learning in Sequential Linear Bandits

In this paper, we study representation learning for multi-task decision-...
research
12/16/2020

Lévy walks derived from a Bayesian decision-making model in non-stationary environments

Lévy walks are found in the migratory behaviour patterns of various orga...
research
02/13/2020

Multiscale Non-stationary Stochastic Bandits

Classic contextual bandit algorithms for linear models, such as LinUCB, ...
research
11/02/2020

Self-Concordant Analysis of Generalized Linear Bandits with Forgetting

Contextual sequential decision problems with categorical or numerical ob...
research
12/08/2020

Adaptive Sampling for Estimating Distributions: A Bayesian Upper Confidence Bound Approach

The problem of adaptive sampling for estimating probability mass functio...
research
11/05/2021

An Empirical Study of Neural Kernel Bandits

Neural bandits have enabled practitioners to operate efficiently on prob...

Please sign up or login with your details

Forgot password? Click here to reset