DeepAI AI Chat
Log In Sign Up

CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing

by   Fan Wu, et al.

We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations. We propose two particular types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards. Specifically, we develop a local smoothing algorithm which uses a policy derived from Q-functions smoothed with Gaussian noise over each encountered state to guarantee the robustness of actions taken along this trajectory. Next, we develop a global smoothing algorithm for certifying the robustness of a finite-horizon cumulative reward under adversarial state perturbations. Finally, we propose a local smoothing approach which makes use of adaptive search in order to obtain tight certification bounds for reward. We use the proposed RL robustness certification framework to evaluate six methods that have previously been shown to yield empirically robust RL, including adversarial training and several forms of regularization, on two representative Atari games. We show that RegPGD, RegCVX, and RadialRL achieve high certified robustness among these. Furthermore, we demonstrate that our certifications are often tight by evaluating these algorithms against adversarial attacks.


Policy Smoothing for Provably Robust Reinforcement Learning

The study of provable adversarial robustness for deep neural network (DN...

Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks

Function approximation has enabled remarkable advances in applying reinf...

Robust Deep Reinforcement Learning through Adversarial Loss

Deep neural networks, including reinforcement learning agents, have been...

Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Deep Reinforcement Learning (RL) agents are susceptible to adversarial n...

Action Robust Reinforcement Learning and Applications in Continuous Control

A policy is said to be robust if it maximizes the reward while consideri...

Tight Second-Order Certificates for Randomized Smoothing

Randomized smoothing is a popular way of providing robustness guarantees...

Measuring Interventional Robustness in Reinforcement Learning

Recent work in reinforcement learning has focused on several characteris...