Policy Optimization with Robustness Certificates

01/26/2023
by   Chenxi Yang, et al.
0

We present a policy optimization framework in which the learned policy comes with a machine-checkable certificate of adversarial robustness. Our approach, called CAROL, learns a model of the environment. In each learning iteration, it uses the current version of this model and an external abstract interpreter to construct a differentiable signal for provable robustness. This signal is used to guide policy learning, and the abstract interpretation used to construct it directly leads to the robustness certificate returned at convergence. We give a theoretical analysis that bounds the worst-case accumulative reward of CAROL. We also experimentally evaluate CAROL on four MuJoCo environments. On these tasks, which involve continuous state and action spaces, CAROL learns certified policies that have performance comparable to the (non-certified) policies learned using state-of-the-art robust RL methods.

READ FULL TEXT
research
11/16/2020

Enforcing robust control guarantees within neural network policies

When designing controllers for safety-critical systems, practitioners of...
research
02/15/2022

User-Oriented Robust Reinforcement Learning

Recently, improving the robustness of policies across different environm...
research
02/09/2018

Learning Robust Options

Robust reinforcement learning aims to produce policies that have strong ...
research
06/18/2019

Robust Reinforcement Learning for Continuous Control with Model Misspecification

We provide a framework for incorporating robustness -- to perturbations ...
research
06/04/2021

Robustifying Reinforcement Learning Policies with ℒ_1 Adaptive Control

A reinforcement learning (RL) policy trained in a nominal environment co...
research
03/23/2022

Your Policy Regularizer is Secretly an Adversary

Policy regularization methods such as maximum entropy regularization are...
research
12/28/2020

Disentangled Planning and Control in Vision Based Robotics via Reward Machines

In this work we augment a Deep Q-Learning agent with a Reward Machine (D...

Please sign up or login with your details

Forgot password? Click here to reset