Improved Regret Bounds for Online Kernel Selection under Bandit Feedback
In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a O((‖ f‖^2_ℋ_i+1)K^1/3T^2/3) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a O(U^2/3K^-1/3(∑^K_i=1L_T(f^∗_i))^2/3) expected bound where L_T(f^∗_i) is the cumulative losses of optimal hypothesis in ℍ_i={f∈ℋ_i:‖ f‖_ℋ_i≤ U}. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a O(U√(KT)ln^2/3T) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous O(√(TlnK) +‖ f‖^2_ℋ_imax{√(T),T/√(ℛ)}) expected bound where ℛ is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.
READ FULL TEXT