Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

by   Junfan Li, et al.

In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a O((‖ f‖^2_ℋ_i+1)K^1/3T^2/3) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a O(U^2/3K^-1/3(∑^K_i=1L_T(f^∗_i))^2/3) expected bound where L_T(f^∗_i) is the cumulative losses of optimal hypothesis in ℍ_i={f∈ℋ_i:‖ f‖_ℋ_i≤ U}. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a O(U√(KT)ln^2/3T) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous O(√(TlnK) +‖ f‖^2_ℋ_imax{√(T),T/√(ℛ)}) expected bound where ℛ is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.


Please sign up or login with your details

Forgot password? Click here to reset