Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

05/31/2020

∙

by Tor Lattimore, et al.

∙

We prove that the information-theoretic upper bound on the minimax regret for adversarial bandit convex optimisation is at most O(d^3 √(n)log(n)), improving on O(d^9.5√(n)log(n)^7.5) by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

READ FULL TEXT

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Sign in with Google

Consider DeepAI Pro