On Information Gain and Regret Bounds in Gaussian Process Bandits
Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function f from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when f is a sample from a Gaussian process (GP)) and a frequentist (when f lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain γ_T between T observations and the underlying GP (surrogate) model. We provide general bounds on γ_T based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on γ_T, and consequently the regret bounds relying on γ_T under numerous settings. For the Matérn family of kernels, where the lower bounds on γ_T, and regret under the frequentist setting, are known, our results close a huge polynomial in T gap between the upper and lower bounds (up to logarithmic in T factors).
READ FULL TEXT