Provable Defense against Backdoor Policies in Reinforcement Learning

11/18/2022
by   Shubham Kumar Bharti, et al.
0

We propose a provable defense mechanism against backdoor policies in reinforcement learning under subspace trigger assumption. A backdoor policy is a security threat where an adversary publishes a seemingly well-behaved policy which in fact allows hidden triggers. During deployment, the adversary can modify observed states in a particular way to trigger unexpected actions and harm the agent. We assume the agent does not have the resources to re-train a good policy. Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a 'safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment. Our sanitized policy achieves ϵ approximate optimality in the presence of triggers, provided the number of clean interactions is O(D/(1-γ)^4 ϵ^2) where γ is the discounting factor and D is the dimension of state space. Empirically, we show that our sanitization defense performs well on two Atari game environments.

READ FULL TEXT
research
02/10/2021

Defense Against Reward Poisoning Attacks in Reinforcement Learning

We study defense strategies against reward poisoning attacks in reinforc...
research
10/30/2022

Imitating Opponent to Win: Adversarial Policy Imitation Learning in Two-player Competitive Games

Recent research on vulnerabilities of deep reinforcement learning (RL) h...
research
07/20/2020

Multi-agent Reinforcement Learning in Bayesian Stackelberg Markov Games for Adaptive Moving Target Defense

The field of cybersecurity has mostly been a cat-and-mouse game with the...
research
05/27/2023

Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in Multi-Agent RL

Most existing works consider direct perturbations of victim's state/acti...
research
05/28/2013

Reinforcement Learning for the Soccer Dribbling Task

We propose a reinforcement learning solution to the soccer dribbling tas...
research
05/29/2018

Virtuously Safe Reinforcement Learning

We show that when a third party, the adversary, steps into the two-party...
research
02/02/2023

Diversity Through Exclusion (DTE): Niche Identification for Reinforcement Learning through Value-Decomposition

Many environments contain numerous available niches of variable value, e...

Please sign up or login with your details

Forgot password? Click here to reset