Is Risk-Sensitive Reinforcement Learning Properly Resolved?

07/02/2023
by   Ruiwen Zhou, et al.
0

Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction. RSRL is usually achieved by learning risk-sensitive objectives characterized by various risk measures, under the framework of distributional reinforcement learning. However, it remains unclear if the distributional Bellman operator properly optimizes the RSRL objective in the sense of risk measures. In this paper, we prove that the existing RSRL methods do not achieve unbiased optimization and can not guarantee optimality or even improvements regarding risk measures over accumulated return distributions. To remedy this issue, we further propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable convergence to the optimal policy. Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies. In the experiments, we verify the learnability of our algorithm and show how our method effectively achieves better performances toward risk-sensitive objectives.

READ FULL TEXT

page 7

page 8

research
10/11/2022

Regret Bounds for Risk-Sensitive Reinforcement Learning

In safety-critical applications of reinforcement learning such as health...
research
07/04/2023

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

We consider the problem of learning models for risk-sensitive reinforcem...
research
11/05/2019

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

While maximizing expected return is the goal in most reinforcement learn...
research
02/05/2021

Addressing Inherent Uncertainty: Risk-Sensitive Behavior Generation for Automated Driving using Distributional Reinforcement Learning

For highly automated driving above SAE level 3, behavior generation algo...
research
02/27/2023

Distributional Method for Risk Averse Reinforcement Learning

We introduce a distributional method for learning the optimal policy in ...
research
05/17/2019

Stochastically Dominant Distributional Reinforcement Learning

We describe a new approach for mitigating risk in the Reinforcement Lear...
research
06/30/2023

Risk-sensitive Actor-free Policy via Convex Optimization

Traditional reinforcement learning methods optimize agents without consi...

Please sign up or login with your details

Forgot password? Click here to reset