Nonparametric Stochastic Contextual Bandits

01/05/2018
by   Melody Y. Guan, et al.
0

We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions. We attain tight results for top-arm identification and a sublinear regret of O(T^1+D/2+D), where D is the context dimension, for a modified UCB algorithm that is simple to implement (kNN-UCB). We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing multi-armed bandit approaches for both simulated tasks and MNIST image classification.

READ FULL TEXT
research
05/05/2014

Generalized Risk-Aversion in Stochastic Multi-Armed Bandits

We consider the problem of minimizing the regret in stochastic multi-arm...
research
10/31/2019

Recovering Bandits

We study the recovering bandits problem, a variant of the stochastic mul...
research
01/23/2023

Congested Bandits: Optimal Routing via Short-term Resets

For traffic routing platforms, the choice of which route to recommend to...
research
10/23/2020

Approximation Methods for Kernelized Bandits

The RKHS bandit problem (also called kernelized multi-armed bandit probl...
research
10/11/2018

Regularized Contextual Bandits

We consider the stochastic contextual bandit problem with additional reg...
research
01/05/2020

A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits

This paper develops a Hoeffding inequality for the partial sums ∑_k=1^n ...
research
04/26/2022

Rate-Constrained Remote Contextual Bandits

We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) p...

Please sign up or login with your details

Forgot password? Click here to reset