RAPTOR: End-to-end Risk-Aware MDP Planning and Policy Learning by Backpropagation

06/14/2021
by   Noah Patton, et al.
0

Planning provides a framework for optimizing sequential decisions in complex environments. Recent advances in efficient planning in deterministic or stochastic high-dimensional domains with continuous action spaces leverage backpropagation through a model of the environment to directly optimize actions. However, existing methods typically not take risk into account when optimizing in stochastic domains, which can be incorporated efficiently in MDPs by optimizing the entropic utility of returns. We bridge this gap by introducing Risk-Aware Planning using PyTorch (RAPTOR), a novel framework for risk-sensitive planning through end-to-end optimization of the entropic utility objective. A key technical difficulty of our approach lies in that direct optimization of the entropic utility by backpropagation is impossible due to the presence of environment stochasticity. The novelty of RAPTOR lies in the reparameterization of the state distribution, which makes it possible to apply stochastic backpropagatation through sufficient statistics of the entropic utility computed from forward-sampled trajectories. The direct optimization of this empirical objective in an end-to-end manner is called the risk-averse straight-line plan, which commits to a sequence of actions in advance and can be sub-optimal in highly stochastic domains. We address this shortcoming by optimizing for risk-aware Deep Reactive Policies (RaDRP) in our framework. We evaluate and compare these two forms of RAPTOR on three highly stochastic do-mains, including nonlinear navigation, HVAC control, and linear reservoir control, demonstrating the ability to manage risk in complex MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2022

Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Recent advances in deep learning have enabled optimization of deep react...
research
04/22/2022

Risk Awareness in HTN Planning

Actual real-world domains are characterised by uncertain situations in w...
research
06/10/2020

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

Sample-based planning is a powerful family of algorithms for generating ...
research
05/27/2021

Exploitation vs Caution: Risk-sensitive Policies for Offline Learning

Offline model learning for planning is a branch of machine learning that...
research
04/05/2019

Scalable Nonlinear Planning with Deep Neural Network Learned Transition Models

In many real-world planning problems with factored, mixed discrete and c...
research
04/19/2020

Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization

Recent works in high-dimensional model-predictive control and model-base...
research
12/12/2022

Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of Data

Recent advances in safety-critical risk-aware control are predicated on ...

Please sign up or login with your details

Forgot password? Click here to reset