On the Sublinear Regret of GP-UCB

07/14/2023
by   Justin Whitehouse, et al.
0

In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Matérn kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results yield sublinear regret rates for the Matérn kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al. Our improvements rely on a key technical contribution – regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel k. Applying this key idea together with a largely overlooked concentration result in separable Hilbert spaces (for which we provide an independent, simplified derivation), we are able to provide a tighter analysis of the GP-UCB algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2020

Bandit optimisation of functions in the Matérn kernel RKHS

We consider the problem of optimising functions in the Reproducing kerne...
research
09/15/2020

On Information Gain and Regret Bounds in Gaussian Process Bandits

Consider the sequential optimization of an expensive to evaluate and pos...
research
10/28/2021

Open Problem: Tight Online Confidence Intervals for RKHS Elements

Confidence intervals are a crucial building block in the analysis of var...
research
06/09/2020

Scalable Thompson Sampling using Sparse Gaussian Process Models

Thompson Sampling (TS) with Gaussian Process (GP) models is a powerful t...
research
07/06/2021

Weighted Gaussian Process Bandits for Non-stationary Environments

In this paper, we consider the Gaussian process (GP) bandit optimization...
research
02/01/2023

Delayed Feedback in Kernel Bandits

Black box optimisation of an unknown function from expensive and noisy e...
research
03/04/2020

Corruption-Tolerant Gaussian Process Bandit Optimization

We consider the problem of optimizing an unknown (typically non-convex) ...

Please sign up or login with your details

Forgot password? Click here to reset