A Risk-Sensitive Approach to Policy Optimization

08/19/2022
by   Jared Markowitz, et al.
7

Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy. This differs from human decision-making, where gains and losses are valued differently and outlying outcomes are given increased consideration. It also fails to capitalize on opportunities to improve safety and/or performance through the incorporation of distributional context. Several approaches to distributional DRL have been investigated, with one popular strategy being to evaluate the projected distribution of returns for possible actions. We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized. This approach allows for outcomes to be weighed based on relative quality, can be used for both continuous and discrete action spaces, and may naturally be applied in both constrained and unconstrained settings. We show how to compute an asymptotically consistent estimate of the policy gradient for a broad class of risk-sensitive objectives via sampling, subsequently incorporating variance reduction and regularization measures to facilitate effective on-policy learning. We then demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies. We test the approach using different risk profiles in six OpenAI Safety Gym environments, comparing to state of the art on-policy methods. Without cost constraints, we find that pessimistic risk profiles can be used to reduce cost while improving total reward accumulation. With cost constraints, they are seen to provide higher positive rewards than risk-neutral approaches at the prescribed allowable cost.

READ FULL TEXT

page 6

page 7

page 13

page 14

research
01/08/2020

Sample-based Distributional Policy Gradient

Distributional reinforcement learning (DRL) is a recent reinforcement le...
research
01/26/2023

On the Global Convergence of Risk-Averse Policy Gradient Methods with Dynamic Time-Consistent Risk Measures

Risk-sensitive reinforcement learning (RL) has become a popular tool to ...
research
06/23/2020

Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

We introduce a novel framework to account for sensitivity to rewards unc...
research
10/11/2022

Regret Bounds for Risk-Sensitive Reinforcement Learning

In safety-critical applications of reinforcement learning such as health...
research
12/06/2019

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

In real-world decision-making problems, for instance in the fields of fi...
research
02/11/2023

Distributional GFlowNets with Quantile Flows

Generative Flow Networks (GFlowNets) are a new family of probabilistic s...
research
07/09/2019

A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

We present a scheme for sequential decision making with a risk-sensitive...

Please sign up or login with your details

Forgot password? Click here to reset