Backdoor Detection in Reinforcement Learning

02/08/2022
by   Junfeng Guo, et al.
0

While the real world application of reinforcement learning (RL) is becoming popular, the safety concern and the robustness of an RL system require more attention. A recent work reveals that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. We propose the problem of RL Backdoor Detection, aiming to address this safety vulnerability. An interesting observation we drew from extensive empirical studies is a trigger smoothness property where normal actions similar to the backdoor trigger actions can also trigger low performance of the trojan agent. Inspired by this observation, we propose a reinforcement learning solution TrojanSeeker to find approximate trigger actions for the trojan agents, and further propose an efficient approach to mitigate the trojan agents based on machine unlearning. Experiments show that our approach can correctly distinguish and mitigate all the trojan agents across various types of agents and environments.

READ FULL TEXT

page 1

page 11

page 12

research
09/11/2019

On Memory Mechanism in Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) extends (single-agent) reinfor...
research
06/04/2021

Be Considerate: Objectives, Side Effects, and Deciding How to Act

Recent work in AI safety has highlighted that in sequential decision mak...
research
05/28/2021

Objective Robustness in Deep Reinforcement Learning

We study objective robustness failures, a type of out-of-distribution ro...
research
06/26/2019

Towards Empathic Deep Q-Learning

As reinforcement learning (RL) scales to solve increasingly complex task...
research
01/12/2022

The Concept of Criticality in AI Safety

When AI agents don't align their actions with human values they may caus...
research
03/09/2023

Recent Advances of Deep Robotic Affordance Learning: A Reinforcement Learning Perspective

As a popular concept proposed in the field of psychology, affordance has...
research
07/25/2023

Safety Margins for Reinforcement Learning

Any autonomous controller will be unsafe in some situations. The ability...

Please sign up or login with your details

Forgot password? Click here to reset