Contextual Multi-armed Bandits under Feature Uncertainty
We study contextual multi-armed bandit problems under linear realizability on rewards and uncertainty (or noise) on features. For the case of identical noise on features across actions, we propose an algorithm, coined NLinRel, having O(T^7/8((dT)+K√(d))) regret bound for T rounds, K actions, and d-dimensional feature vectors. Next, for the case of non-identical noise, we observe that popular linear hypotheses including NLinRel are impossible to achieve such sub-linear regret. Instead, under assumption of Gaussian feature vectors, we prove that a greedy algorithm has O(T^2/3√( d)) regret bound with respect to the optimal linear hypothesis. Utilizing our theoretical understanding on the Gaussian case, we also design a practical variant of NLinRel, coined Universal-NLinRel, for arbitrary feature distributions. It first runs NLinRel for finding the `true' coefficient vector using feature uncertainties and then adjust it to minimize its regret using the statistical feature information. We justify the performance of Universal-NLinRel on both synthetic and real-world datasets.
READ FULL TEXT