Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

01/30/2022
by   Jeongyeol Kwon, et al.
0

Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction α < 1/2 of tasks (users) are arbitrary and adversarial. The remaining fraction of good users share the same instance of contextual bandits with S contexts and A actions (items). Naturally, whether a user is good or adversarial is not known in advance. The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible. Without adversarial users, established results in collaborative filtering show that O(1/ϵ^2) per-user interactions suffice to learn a good policy, precisely because information can be shared across users. This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of Ω̃(min(S,A) ·α^2 / ϵ^2) per-user interactions to learn an ϵ-optimal policy for the good users. We then show we can achieve an Õ(min(S,A)·α/ϵ^2) upper-bound, by employing efficient robust mean estimators for both uni-variate and high-dimensional random variables. We also show that this can be improved depending on the distributions of contexts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2022

Reward-Mixing MDPs with a Few Latent Contexts are Learnable

We consider episodic reinforcement learning in reward-mixing Markov deci...
research
02/19/2023

Estimating Optimal Policy Value in General Linear Contextual Bandits

In many bandit problems, the maximal reward achievable by a policy is of...
research
10/29/2018

Heteroscedastic Bandits with Reneging

Although shown to be useful in many areas as models for solving sequenti...
research
06/06/2020

Contextual Bandits with Side-Observations

We investigate contextual bandits in the presence of side-observations a...
research
10/21/2022

Anonymous Bandits for Multi-User Systems

In this work, we present and study a new framework for online learning i...
research
01/08/2013

Linear Bandits in High Dimension and Recommendation Systems

A large number of online services provide automated recommendations to h...
research
03/06/2015

Sequential Relevance Maximization with Binary Feedback

Motivated by online settings where users can provide explicit feedback a...

Please sign up or login with your details

Forgot password? Click here to reset