A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

03/07/2023
by   Nick Bührer, et al.
0

An emerging field of sequential decision problems is safe Reinforcement Learning (RL), where the objective is to maximize the reward while obeying safety constraints. Being able to handle constraints is essential for deploying RL agents in real-world environments, where constraint violations can harm the agent and the environment. To this end, we propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. By splitting responsibilities, we facilitate the learning task leading to increased sample efficiency. We integrate our approach into two popular RL algorithms, Proximal Policy Optimization and Soft Actor-Critic, and evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations. Finally, we make the zero-shot sim-to-real transfer where a differential drive robot has to navigate through a cluttered room. Our code can be found at https://github.com/nikeke19/Safe-Mult-RL.

READ FULL TEXT

page 2

page 6

page 7

page 10

page 13

page 14

page 15

page 16

research
04/20/2022

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Although Reinforcement Learning (RL) is effective for sequential decisio...
research
05/22/2021

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

The safety constraints commonly used by existing safe reinforcement lear...
research
10/02/2022

Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

We address the problem of safe reinforcement learning from pixel observa...
research
11/25/2021

Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

In the trial-and-error mechanism of reinforcement learning (RL), a notor...
research
06/24/2022

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Designing reinforcement learning (RL) agents is typically a difficult pr...
research
01/26/2018

Safe Exploration in Continuous Action Spaces

We address the problem of deploying a reinforcement learning (RL) agent ...
research
06/20/2022

Benchmarking Constraint Inference in Inverse Reinforcement Learning

When deploying Reinforcement Learning (RL) agents into a physical system...

Please sign up or login with your details

Forgot password? Click here to reset