Quantile-Based Deep Reinforcement Learning using Two-Timescale Policy Gradient Algorithms

05/12/2023
by   Jinyang Jiang, et al.
0

Classical reinforcement learning (RL) aims to optimize the expected cumulative reward. In this work, we consider the RL setting where the goal is to optimize the quantile of the cumulative reward. We parameterize the policy controlling actions by neural networks, and propose a novel policy gradient algorithm called Quantile-Based Policy Optimization (QPO) and its variant Quantile-Based Proximal Policy Optimization (QPPO) for solving deep RL problems with quantile objectives. QPO uses two coupled iterations running at different timescales for simultaneously updating quantiles and policy parameters, whereas QPPO is an off-policy version of QPO that allows multiple updates of parameters during one simulation episode, leading to improved algorithm efficiency. Our numerical results indicate that the proposed algorithms outperform the existing baseline algorithms under the quantile criterion.

READ FULL TEXT

page 27

page 28

research
01/27/2022

Quantile-Based Policy Optimization for Reinforcement Learning

Classical reinforcement learning (RL) aims to optimize the expected cumu...
research
06/27/2019

Quantile Regression Deep Reinforcement Learning

Policy gradient based reinforcement learning algorithms coupled with neu...
research
01/09/2020

Population-Guided Parallel Policy Search for Reinforcement Learning

In this paper, a new population-guided parallel learning scheme is propo...
research
11/28/2022

Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability

Constrained reinforcement learning (RL) is an area of RL whose objective...
research
10/21/2019

IPO: Interior-point Policy Optimization under Constraints

In this paper, we study reinforcement learning (RL) algorithms to solve ...
research
08/09/2018

Policy Optimization as Wasserstein Gradient Flows

Policy optimization is a core component of reinforcement learning (RL), ...
research
05/10/2019

Autonomous Management of Energy-Harvesting IoT Nodes Using Deep Reinforcement Learning

Reinforcement learning (RL) is capable of managing wireless, energy-harv...

Please sign up or login with your details

Forgot password? Click here to reset