Constrained Decision Transformer for Offline Safe Reinforcement Learning

02/14/2023
by   Zuxin Liu, et al.
0

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the ϵ-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2021

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

We study the problem of Safe Policy Improvement (SPI) under constraints ...
research
06/01/2023

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Aiming at promoting the safe real-world deployment of Reinforcement Lear...
research
06/27/2022

Prompting Decision Transformer for Few-Shot Policy Generalization

Humans can leverage prior experience and learn novel tasks from a handfu...
research
10/13/2020

Balancing Constraints and Rewards with Meta-Gradient D4PG

Deploying Reinforcement Learning (RL) agents to solve real-world applica...
research
01/28/2022

Towards Safe Reinforcement Learning with a Safety Editor Policy

We consider the safe reinforcement learning (RL) problem of maximizing u...
research
10/19/2022

Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization

We propose a framework to enable multipurpose assistive mobile robots to...
research
06/29/2023

Probabilistic Constraint for Safety-Critical Reinforcement Learning

In this paper, we consider the problem of learning safe policies for pro...

Please sign up or login with your details

Forgot password? Click here to reset