Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

02/15/2021
by   Anshuka Rangi, et al.
0

This paper studies bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that any bandit algorithm with regret O(log T) can be forced to suffer a regret Ω(T) with an expected amount of contamination O(log T). This amount of contamination is also necessary, as we prove that there exists an O(log T) regret bandit algorithm, specifically the classical UCB, that requires Ω(log T) amount of contamination to suffer regret Ω(T). To combat such poising attacks, our second main contribution is to propose a novel algorithm, Secure-UCB, which uses limited verification to access a limited number of uncontaminated rewards. We show that with O(log T) expected number of verifications, Secure-UCB can restore the order optimal O(log T) regret irrespective of the amount of contamination used by the attacker. Finally, we prove that for any bandit algorithm, this number of verifications O(log T) is necessary to recover the order-optimal regret. We can then conclude that Secure-UCB is order-optimal in terms of both the expected regret and the expected number of verifications, and can save stochastic bandits from any data poisoning attack.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2019

Data Poisoning Attacks on Stochastic Bandits

Stochastic multi-armed bandits form a class of online learning problems ...
research
05/12/2022

Collaborative Multi-agent Stochastic Linear Bandits

We study a collaborative multi-agent stochastic linear bandit setting, w...
research
10/29/2018

Adversarial Attacks on Stochastic Bandits

We study adversarial attacks that manipulate the reward signals to contr...
research
07/17/2020

Bandits for BMO Functions

We study the bandit problem where the underlying expected reward is a Bo...
research
02/17/2020

Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

The multi-armed bandit formalism has been extensively studied under vari...
research
02/08/2021

Correlated Bandits for Dynamic Pricing via the ARC algorithm

The Asymptotic Randomised Control (ARC) algorithm provides a rigorous ap...
research
02/27/2023

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Consider a decision-maker that can pick one out of K actions to control ...

Please sign up or login with your details

Forgot password? Click here to reset