Stochastic continuum armed bandit problem of few linear parameters in high dimensions

12/01/2013
by   Hemant Tyagi, et al.
0

We consider a stochastic continuum armed bandit problem where the arms are indexed by the ℓ_2 ball B_d(1+ν) of radius 1+ν in R^d. The reward functions r :B_d(1+ν) →R are considered to intrinsically depend on k ≪ d unknown linear parameters so that r(x) = g(Ax) where A is a full rank k × d matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery literature and derive an efficient randomized algorithm which achieves a regret bound of O(C(k,d) n^1+k/2+k ( n)^1/2+k) with high probability. Here C(k,d) is at most polynomial in d and k and n is the number of rounds or the sampling budget which is assumed to be known beforehand.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2018

Regret Bounds for Stochastic Combinatorial Multi-Armed Bandits with Linear Space Complexity

Many real-world problems face the dilemma of choosing best K out of N op...
research
06/17/2016

Structured Stochastic Linear Bandits

The stochastic linear bandit problem proceeds in rounds where at each ro...
research
09/08/2022

Online Low Rank Matrix Completion

We study the problem of online low-rank matrix completion with 𝖬 users, ...
research
06/04/2020

Low-Rank Generalized Linear Bandit Problems

In a low-rank linear bandit problem, the reward of an action (represente...
research
10/01/2022

Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms

Multi-armed bandit (MAB) algorithms are efficient approaches to reduce t...
research
06/10/2015

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

This work addresses the problem of regret minimization in non-stochastic...
research
12/13/2017

Stochastic Low-Rank Bandits

Many problems in computer vision and recommender systems involve low-ran...

Please sign up or login with your details

Forgot password? Click here to reset