Worst Cases Policy Gradients

11/09/2019
by   Yichuan Charlie Tang, et al.
20

Recent advances in deep reinforcement learning have demonstrated the capability of learning complex control policies from many types of environments. When learning policies for safety-critical applications, it is essential to be sensitive to risks and avoid catastrophic events. Towards this goal, we propose an actor-critic framework that models the uncertainty of the future and simultaneously learns a policy based on that uncertainty model. Specifically, given a distribution of the future return for any state and action, we optimize policies for varying levels of conditional Value-at-Risk. The learned policy can map the same state to different actions depending on the propensity for risk. We demonstrate the effectiveness of our approach in the domain of driving simulations, where we learn maneuvers in two scenarios. Our learned controller can dynamically select actions along a continuous axis, where safe and conservative behaviors are found at one end while riskier behaviors are found at the other. Finally, when testing with very different simulation parameters, our risk-averse policies generalize significantly better compared to other reinforcement learning approaches.

READ FULL TEXT

page 5

page 7

page 8

page 16

research
07/15/2022

Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions

We present a method for finding optimal hedging policies for arbitrary i...
research
08/04/2021

Risk Conditioned Neural Motion Planning

Risk-bounded motion planning is an important yet difficult problem for s...
research
06/09/2022

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Though deep reinforcement learning (DRL) has obtained substantial succes...
research
01/13/2023

Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning

In safety-critical decision-making scenarios being able to identify wors...
research
11/05/2019

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

While maximizing expected return is the goal in most reinforcement learn...
research
10/19/2021

Watch out for the risky actors: Assessing risk in dynamic environments for safe driving

Driving in a dynamic environment that consists of other actors is inhere...
research
06/30/2023

Risk-sensitive Actor-free Policy via Convex Optimization

Traditional reinforcement learning methods optimize agents without consi...

Please sign up or login with your details

Forgot password? Click here to reset