Balancing Constraints and Rewards with Meta-Gradient D4PG

10/13/2020
by   Dan A. Calian, et al.
0

Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved without violating the constraints. However, in many real-world cases, constraint violations are undesirable yet they are not catastrophic, motivating the need for soft-constrained RL approaches. We present two soft-constrained RL approaches that utilize meta-gradients to find a good trade-off between expected return and minimizing constraint violations. We demonstrate the effectiveness of these approaches by showing that they consistently outperform the baselines across four different Mujoco domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

We consider the offline constrained reinforcement learning (RL) problem,...
research
02/14/2023

Constrained Decision Transformer for Offline Safe Reinforcement Learning

Safe reinforcement learning (RL) trains a constraint satisfaction policy...
research
10/02/2020

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

We study the offline meta-reinforcement learning (OMRL) problem, a parad...
research
10/19/2022

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

A promising paradigm for offline reinforcement learning (RL) is to const...
research
01/28/2023

SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning

Offline safe RL is of great practical relevance for deploying agents in ...
research
02/06/2023

State-wise Safe Reinforcement Learning: A Survey

Despite the tremendous success of Reinforcement Learning (RL) algorithms...
research
12/02/2021

Residual Pathway Priors for Soft Equivariance Constraints

There is often a trade-off between building deep learning systems that a...

Please sign up or login with your details

Forgot password? Click here to reset