Policy Gradient with Expected Quadratic Utility Maximization: A New Mean-Variance Approach in Reinforcement Learning

10/03/2020
by   Masahiro Kato, et al.
0

In real-world decision-making problems, risk management is critical. Among various risk management approaches, the mean-variance criterion is one of the most widely used in practice. In this paper, we suggest expected quadratic utility maximization (EQUM) as a new framework for policy gradient style reinforcement learning (RL) algorithms with mean-variance control. The quadratic utility function is a common objective of risk management in finance and economics. The proposed EQUM framework has several interpretations, such as reward-constrained variance minimization and regularization, as well as agent utility maximization. In addition, the computation of the EQUM framework is easier than that of existing mean-variance RL methods, which require double sampling. In experiments, we demonstrate the effectiveness of the proposed framework in the benchmarks of RL and financial data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2022

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Keeping risk under control is often more crucial than maximizing expecte...
research
06/27/2012

Policy Gradients with Variance Related Risk Criteria

Managing risk in dynamic decision problems is of cardinal importance in ...
research
01/26/2023

On the Global Convergence of Risk-Averse Policy Gradient Methods with Dynamic Time-Consistent Risk Measures

Risk-sensitive reinforcement learning (RL) has become a popular tool to ...
research
12/06/2019

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

In real-world decision-making problems, for instance in the fields of fi...
research
10/18/2021

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient

Improving the resilience of a network protects the system from natural d...
research
07/17/2023

An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

Restricting the variance of a policy's return is a popular choice in ris...
research
10/22/2018

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

The classic objective in a reinforcement learning (RL) problem is to fin...

Please sign up or login with your details

Forgot password? Click here to reset