Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

02/17/2020
by   Ziwei Guan, et al.
14

The multi-armed bandit formalism has been extensively studied under various attack models, in which an adversary can modify the reward revealed to the player. Previous studies focused on scenarios where the attack value either is bounded at each round or has a vanishing probability of occurrence. These models do not capture powerful adversaries that can catastrophically perturb the revealed reward. This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks. Furthermore, the attack value does not necessarily follow a statistical distribution. We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based ϵ-greedy algorithm (called med-ϵ-greedy). Both of these algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve O(log T) pseudo-regret (i.e., the optimal regret without attacks). We also provide a high probability guarantee of O(log T) regret with respect to random rewards and random occurrence of attacks. These bounds are achieved under arbitrary and unbounded reward perturbation as long as the attack probability does not exceed a certain constant threshold. We provide multiple synthetic simulations of the proposed algorithms to verify these claims and showcase the inability of existing techniques to achieve sublinear regret. We also provide experimental results of the algorithm operating in a cognitive radio setting using multiple software-defined radios.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 8

page 10

research
11/13/2018

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

We propose a multi-armed bandit algorithm that explores based on randomi...
research
02/19/2020

Action-Manipulation Attacks Against Stochastic Bandits: Attacks and Defense

Due to the broad range of applications of stochastic multi-armed bandit ...
research
10/02/2020

Neural Thompson Sampling

Thompson Sampling (TS) is one of the most effective algorithms for solvi...
research
02/15/2021

Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

This paper studies bandit algorithms under data poisoning attacks in a b...
research
08/29/2022

Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning

To understand the security threats to reinforcement learning (RL) algori...
research
02/19/2020

Residual Bootstrap Exploration for Bandit Algorithms

In this paper, we propose a novel perturbation-based exploration method ...
research
06/12/2019

Cyber attacks with bounded sensor reading edits for partially-observed discrete event systems

The problem of cyber attacks with bounded sensor reading edits for parti...

Please sign up or login with your details

Forgot password? Click here to reset