Stochastic continuum armed bandit problem of few linear parameters in high dimensions
We consider a stochastic continuum armed bandit problem where the arms are indexed by the ℓ_2 ball B_d(1+ν) of radius 1+ν in R^d. The reward functions r :B_d(1+ν) →R are considered to intrinsically depend on k ≪ d unknown linear parameters so that r(x) = g(Ax) where A is a full rank k × d matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery literature and derive an efficient randomized algorithm which achieves a regret bound of O(C(k,d) n^1+k/2+k ( n)^1/2+k) with high probability. Here C(k,d) is at most polynomial in d and k and n is the number of rounds or the sampling budget which is assumed to be known beforehand.
READ FULL TEXT