Near Optimal Adversarial Attack on UCB Bandits

08/21/2020
by   Shiliang Zuo, et al.
0

We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm T - o(T) times with a cumulative cost that scales as √(log T), where T is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to loglog T factors, showing our attack to be near optimal.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro