Weighted Gaussian Process Bandits for Non-stationary Environments

07/06/2021
by   Yuntian Deng, et al.
0

In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are potentially of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2021

No-Regret Algorithms for Time-Varying Bayesian Optimization

In this paper, we consider the time-varying Bayesian optimization proble...
research
08/20/2021

Optimal Order Simple Regret for Gaussian Process Bandits

Consider the sequential optimization of a continuous, possibly non-conve...
research
07/02/2022

Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

To fully utilize the abundant spectrum resources in millimeter wave (mmW...
research
05/30/2021

Periodic-GP: Learning Periodic World with Gaussian Process Bandits

We consider the sequential decision optimization on the periodic environ...
research
09/19/2019

Weighted Linear Bandits for Non-Stationary Environments

We consider a stochastic linear bandit model in which the available acti...
research
07/14/2023

On the Sublinear Regret of GP-UCB

In the kernelized bandit problem, a learner aims to sequentially compute...
research
12/05/2017

Gaussian Process bandits with adaptive discretization

In this paper, the problem of maximizing a black-box function f:X→R is s...

Please sign up or login with your details

Forgot password? Click here to reset