The K-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates

03/01/2018
by   Henry WJ Reeve, et al.
0

In this paper we propose and explore the k-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates. We focus on a setting where the covariates are supported on a metric space of low intrinsic dimension, such as a manifold embedded within a high dimensional ambient feature space. The algorithm is conceptually simple and straightforward to implement. The k-Nearest Neighbour UCB algorithm does not require prior knowledge of the either the intrinsic dimension of the marginal distribution or the time horizon. We prove a regret bound for the k-Nearest Neighbour UCB algorithm which is minimax optimal up to logarithmic factors. In particular, the algorithm automatically takes advantage of both low intrinsic dimensionality of the marginal distribution over the covariates and low noise in the data, expressed as a margin condition. In addition, focusing on the case of bounded rewards, we give corresponding regret bounds for the k-Nearest Neighbour KL-UCB algorithm, which is an analogue of the KL-UCB algorithm adapted to the setting of multi-armed bandits with covariates. Finally, we present empirical results which demonstrate the ability of both the k-Nearest Neighbour UCB and k-Nearest Neighbour KL-UCB to take advantage of situations where the data is supported on an unknown sub-manifold of a high-dimensional feature space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2018

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

An online reinforcement learning algorithm is anytime if it does not nee...
research
03/01/2018

Minimax rates for cost-sensitive learning on manifolds with approximate nearest neighbours

We study the approximate nearest neighbour method for cost-sensitive cla...
research
04/28/2023

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards

We study K-armed bandit problems where the reward distributions of the a...
research
03/03/2020

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

We consider the problem of finitely parameterized multi-armed bandits wh...
research
10/08/2017

Using the Value of Information to Explore Stochastic, Discrete Multi-Armed Bandits

In this paper, we propose an information-theoretic exploration strategy ...
research
05/26/2022

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment

Multi-armed bandit (MAB) is a classic model for understanding the explor...
research
12/01/2015

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

Existing methods for retrieving k-nearest neighbours suffer from the cur...

Please sign up or login with your details

Forgot password? Click here to reset