A Dimension-free Algorithm for Contextual Continuum-armed Bandits

07/15/2019
by   Wenhao Li, et al.
0

In contextual continuum-armed bandits, the contexts x and the arms y are both continuous and drawn from high-dimensional spaces. The payoff function to learn f(x,y) does not have a particular parametric form. The literature has shown that for Lipschitz-continuous functions, the optimal regret is Õ(T^d_x+d_y+1/d_x+d_y+2), where d_x and d_y are the dimensions of contexts and arms, and thus suffers from the curse of dimensionality. We develop an algorithm that achieves regret Õ(T^d_x+1/d_x+2) when f is globally concave in y. The global concavity is a common assumption in many applications. The algorithm is based on stochastic approximation and estimates the gradient information in an online fashion. Our results generate a valuable insight that the curse of dimensionality of the arms can be overcome with some mild structures of the payoff function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2023

Contextual Combinatorial Bandits with Probabilistically Triggered Arms

We study contextual combinatorial bandits with probabilistically trigger...
research
05/18/2015

Simple regret for infinitely many armed bandits

We consider a stochastic bandit problem with infinitely many arms. In th...
research
02/11/2013

Adaptive-treed bandits

We describe a novel algorithm for noisy global optimisation and continuu...
research
09/17/2020

Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection

We consider a contextual online learning (multi-armed bandit) problem wi...
research
09/15/2022

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

We propose a novel contextual bandit algorithm for generalized linear re...
research
05/02/2023

Stochastic Contextual Bandits with Graph-based Contexts

We naturally generalize the on-line graph prediction problem to a versio...
research
02/02/2022

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Contextual bandits are widely-used in the study of learning-based contro...

Please sign up or login with your details

Forgot password? Click here to reset