SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning

01/28/2023
by   Qin Zhang, et al.
9

Offline safe RL is of great practical relevance for deploying agents in real-world applications. However, acquiring constraint-satisfying policies from the fixed dataset is non-trivial for conventional approaches. Even worse, the learned constraints are stationary and may become invalid when the online safety requirement changes. In this paper, we present a novel offline safe RL approach referred to as SaFormer, which tackles the above issues via conditional sequence modeling. In contrast to existing sequence models, we propose cost-related tokens to restrict the action space and a posterior safety verification to enforce the constraint explicitly. Specifically, SaFormer performs a two-stage auto-regression conditioned by the maximum remaining cost to generate feasible candidates. It then filters out unsafe attempts and executes the optimal action with the highest expected return. Extensive experiments demonstrate the efficacy of SaFormer featuring (1) competitive returns with tightened constraint satisfaction; (2) adaptability to the in-range cost values of the offline data without retraining; (3) generalizability for constraints beyond the current dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

We consider the offline constrained reinforcement learning (RL) problem,...
research
07/19/2021

Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning

We study the problem of safe offline reinforcement learning (RL), the go...
research
06/01/2023

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Aiming at promoting the safe real-world deployment of Reinforcement Lear...
research
10/13/2020

Balancing Constraints and Rewards with Meta-Gradient D4PG

Deploying Reinforcement Learning (RL) agents to solve real-world applica...
research
02/10/2022

SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition

Though many reinforcement learning (RL) problems involve learning polici...
research
02/15/2023

Deep Offline Reinforcement Learning for Real-World Treatment Optimization Applications

There is increasing interest in data-driven approaches for dynamically c...

Please sign up or login with your details

Forgot password? Click here to reset