Recover Triggered States: Protect Model Against Backdoor Attack in Reinforcement Learning

04/01/2023
by   Hao Chen, et al.
0

A backdoor attack allows a malicious user to manipulate the environment or corrupt the training data, thus inserting a backdoor into the trained agent. Such attacks compromise the RL system's reliability, leading to potentially catastrophic results in various key fields. In contrast, relatively limited research has investigated effective defenses against backdoor attacks in RL. This paper proposes the Recovery Triggered States (RTS) method, a novel approach that effectively protects the victim agents from backdoor attacks. RTS involves building a surrogate network to approximate the dynamics model. Developers can then recover the environment from the triggered state to a clean state, thereby preventing attackers from activating backdoors hidden in the agent by presenting the trigger. When training the surrogate to predict states, we incorporate agent action information to reduce the discrepancy between the actions taken by the agent on predicted states and the actions taken on real states. RTS is the first approach to defend against backdoor attacks in a single-agent setting. Our results show that using RTS, the cumulative reward only decreased by 1.41

READ FULL TEXT

page 2

page 5

research
09/05/2019

Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

Robustness of Deep Reinforcement Learning (DRL) algorithms towards adver...
research
07/15/2023

Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning

Due to the broad range of applications of multi-agent reinforcement lear...
research
09/06/2019

Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Recent research on reinforcement learning has shown that trained agents ...
research
02/19/2022

Learning a Shield from Catastrophic Action Effects: Never Repeat the Same Mistake

Agents that operate in an unknown environment are bound to make mistakes...
research
01/14/2021

How to Attack and Defend 5G Radio Access Network Slicing with Reinforcement Learning

Reinforcement learning (RL) for network slicing is considered in the 5G ...
research
10/20/2022

MoCoDA: Model-based Counterfactual Data Augmentation

The number of states in a dynamic process is exponential in the number o...
research
12/30/2022

RL and Fingerprinting to Select Moving Target Defense Mechanisms for Zero-day Attacks in IoT

Cybercriminals are moving towards zero-day attacks affecting resource-co...

Please sign up or login with your details

Forgot password? Click here to reset