Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability

11/28/2022
by   Whiyoung Jung, et al.
0

Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. This paper proposes a framework, named Quantile Constrained RL (QCRL), to constrain the quantile of the distribution of the cumulative sum cost that is a necessary and sufficient condition to satisfy the outage constraint. This is the first work that tackles the issue of applying the policy gradient theorem to the quantile and provides theoretical results for approximating the gradient of the quantile. Based on the derived theoretical results and the technique of the Lagrange multiplier, we construct a constrained RL algorithm named Quantile Constrained Policy Optimization (QCPO). We use distributional RL with the Large Deviation Principle (LDP) to estimate quantiles and tail probability of the cumulative sum cost for the implementation of QCPO. The implemented algorithm satisfies the outage probability constraint after the training period.

READ FULL TEXT
research
01/27/2022

Quantile-Based Policy Optimization for Reinforcement Learning

Classical reinforcement learning (RL) aims to optimize the expected cumu...
research
02/03/2023

Distributional constrained reinforcement learning for supply chain optimization

This work studies reinforcement learning (RL) in the context of multi-pe...
research
05/12/2023

Quantile-Based Deep Reinforcement Learning using Two-Timescale Policy Gradient Algorithms

Classical reinforcement learning (RL) aims to optimize the expected cumu...
research
10/22/2018

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

The classic objective in a reinforcement learning (RL) problem is to fin...
research
07/04/2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

In recent years, reinforcement learning (RL) systems with general goals ...
research
07/06/2009

The Soft Cumulative Constraint

This research report presents an extension of Cumulative of Choco constr...
research
05/25/2021

A Generalised Inverse Reinforcement Learning Framework

The gloabal objective of inverse Reinforcement Learning (IRL) is to esti...

Please sign up or login with your details

Forgot password? Click here to reset