Neurosymbolic Reinforcement Learning with Formally Verified Exploration

09/26/2020
by   Greg Anderson, et al.
0

We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.

READ FULL TEXT
research
05/13/2022

Provably Safe Reinforcement Learning: A Theoretical and Experimental Comparison

Ensuring safety of reinforcement learning (RL) algorithms is crucial for...
research
02/22/2018

Diverse Exploration for Fast and Safe Policy Improvement

We study an important yet under-addressed problem of quickly and safely ...
research
12/30/2022

Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search

Learning efficient and interpretable policies has been a challenging tas...
research
02/10/2019

Diverse Exploration via Conjugate Policies for Policy Gradient Methods

We address the challenge of effective exploration while maintaining good...
research
08/07/2020

SafePILCO: a software tool for safe and data-efficient policy synthesis

SafePILCO is a software tool for safe and data-efficient policy search w...
research
07/19/2021

Improving exploration in policy gradient search: Application to symbolic optimization

Many machine learning strategies designed to automate mathematical tasks...
research
09/05/2023

Provably safe systems: the only path to controllable AGI

We describe a path to humanity safely thriving with powerful Artificial ...

Please sign up or login with your details

Forgot password? Click here to reset