AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement Learning

01/24/2023
by   Tairan He, et al.
0

Safety is a critical hurdle that limits the application of deep reinforcement learning (RL) to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision processes. However, such constrained RL methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safe RL benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs. Results show that the converged policies with intrinsic costs in all environments achieve zero constraint violation and comparable performance with baselines.

READ FULL TEXT

page 5

page 6

page 7

research
05/20/2018

A Lyapunov-based Approach to Safe Reinforcement Learning

In many real-world reinforcement learning (RL) problems, besides optimiz...
research
09/29/2022

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

It is quite challenging to ensure the safety of reinforcement learning (...
research
10/14/2022

Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

During initial iterations of training in most Reinforcement Learning (RL...
research
01/27/2023

Solving Constrained Reinforcement Learning through Augmented State and Reward Penalties

Constrained Reinforcement Learning has been employed to enforce safety c...
research
02/28/2017

Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Conventional reinforcement learning methods for Markov decision processe...
research
03/03/2022

Optimized cost function for demand response coordination of multiple EV charging stations using reinforcement learning

Electric vehicle (EV) charging stations represent a substantial load wit...
research
12/17/2021

Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)

We consider the challenge of policy simplification and verification in t...

Please sign up or login with your details

Forgot password? Click here to reset