Policy Poisoning in Batch Reinforcement Learning and Control

10/13/2019
by   Yuzhe Ma, et al.
8

We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy. The victim is a reinforcement learner / controller which first estimates the dynamics and the rewards from a batch data set, and then solves for the optimal policy with respect to the estimates. The attacker can modify the data set slightly before learning happens, and wants to force the learner into learning a target policy chosen by the attacker. We present a unified framework for solving batch policy poisoning attacks, and instantiate the attack on two standard victims: tabular certainty equivalence learner in reinforcement learning and linear quadratic regulator in control. We show that both instantiation result in a convex optimization problem on which global optimality is guaranteed, and provide analysis on attack feasibility and attack cost. Experiments show the effectiveness of policy poisoning attacks.

READ FULL TEXT

page 7

page 20

page 21

research
03/11/2022

Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation

In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG...
research
11/21/2020

Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks

We study a security threat to reinforcement learning where an attacker p...
research
03/28/2020

Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning

We study a security threat to reinforcement learning where an attacker p...
research
04/25/2023

Model Extraction Attacks Against Reinforcement Learning Based Controllers

We introduce the problem of model-extraction attacks in cyber-physical s...
research
01/03/2022

Execute Order 66: Targeted Data Poisoning for Reinforcement Learning

Data poisoning for reinforcement learning has historically focused on ge...
research
06/04/2022

Reward Poisoning Attacks on Offline Multi-Agent Reinforcement Learning

We expose the danger of reward poisoning in offline multi-agent reinforc...
research
04/08/2023

Evolving Reinforcement Learning Environment to Minimize Learner's Achievable Reward: An Application on Hardening Active Directory Systems

We study a Stackelberg game between one attacker and one defender in a c...

Please sign up or login with your details

Forgot password? Click here to reset