Defense Against Reward Poisoning Attacks in Reinforcement Learning

02/10/2021
by   Kiarash Banihashem, et al.
0

We study defense strategies against reward poisoning attacks in reinforcement learning. As a threat model, we consider attacks that minimally alter rewards to make the attacker's target policy uniquely optimal under the poisoned rewards, with the optimality gap specified by an attack parameter. Our goal is to design agents that are robust against such attacks in terms of the worst-case utility w.r.t. the true, unpoisoned, rewards while computing their policies under the poisoned rewards. We propose an optimization framework for deriving optimal defense policies, both when the attack parameter is known and unknown. Moreover, we show that defense policies that are solutions to the proposed optimization problems have provable performance guarantees. In particular, we provide the following bounds with respect to the true, unpoisoned, rewards: a) lower bounds on the expected return of the defense policies, and b) upper bounds on how suboptimal these defense policies are compared to the attacker's target policy. We conclude the paper by illustrating the intuitions behind our formal results, and showing that the derived bounds are non-trivial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2020

Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks

We study a security threat to reinforcement learning where an attacker p...
research
11/18/2022

Provable Defense against Backdoor Policies in Reinforcement Learning

We propose a provable defense mechanism against backdoor policies in rei...
research
06/07/2021

Reconciling Rewards with Predictive State Representations

Predictive state representations (PSRs) are models of controlled non-Mar...
research
09/08/2022

Reward Delay Attacks on Deep Reinforcement Learning

Most reinforcement learning algorithms implicitly assume strong synchron...
research
02/26/2019

Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

We consider a settings of hierarchical reinforcement learning, in which ...
research
02/25/2020

Double-Spend Counterattacks: Threat of Retaliation in Proof-of-Work Systems

Proof-of-Work mining is intended to provide blockchains with robustness ...
research
12/01/2016

When to Reset Your Keys: Optimal Timing of Security Updates via Learning

Cybersecurity is increasingly threatened by advanced and persistent atta...

Please sign up or login with your details

Forgot password? Click here to reset