Quantile-Based Policy Optimization for Reinforcement Learning

01/27/2022
by   Jinyang Jiang, et al.
0

Classical reinforcement learning (RL) aims to optimize the expected cumulative rewards. In this work, we consider the RL setting where the goal is to optimize the quantile of the cumulative rewards. We parameterize the policy controlling actions by neural networks and propose a novel policy gradient algorithm called Quantile-Based Policy Optimization (QPO) and its variant Quantile-Based Proximal Policy Optimization (QPPO) to solve deep RL problems with quantile objectives. QPO uses two coupled iterations running at different time scales for simultaneously estimating quantiles and policy parameters and is shown to converge to the global optimal policy under certain conditions. Our numerical results demonstrate that the proposed algorithms outperform the existing baseline algorithms under the quantile criterion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2023

Quantile-Based Deep Reinforcement Learning using Two-Timescale Policy Gradient Algorithms

Classical reinforcement learning (RL) aims to optimize the expected cumu...
research
11/28/2022

Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability

Constrained reinforcement learning (RL) is an area of RL whose objective...
research
11/03/2016

Quantile Reinforcement Learning

In reinforcement learning, the standard criterion to evaluate policies i...
research
06/27/2019

Quantile Regression Deep Reinforcement Learning

Policy gradient based reinforcement learning algorithms coupled with neu...
research
03/22/2023

Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality

A novel Policy Gradient (PG) algorithm, called Matryoshka Policy Gradien...
research
06/02/2023

Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting Task

This paper presents a comparison between two well-known deep Reinforceme...
research
07/17/2014

Optimization Under Uncertainty Using the Generalized Inverse Distribution Function

A framework for robust optimization under uncertainty based on the use o...

Please sign up or login with your details

Forgot password? Click here to reset