Bi-directional Value Learning for Risk-aware Planning Under Uncertainty

02/15/2019
by   Sung-Kyun Kim, et al.
0

Decision-making under uncertainty is a crucial ability for autonomous systems. In its most general form, this problem can be formulated as a Partially Observable Markov Decision Process (POMDP). The solution policy of a POMDP can be implicitly encoded as a value function. In partially observable settings, the value function is typically learned via forward simulation of the system evolution. Focusing on accurate and long-range risk assessment, we propose a novel method, where the value function is learned in different phases via a bi-directional search in belief space. A backward value learning process provides a long-range and risk-aware base policy. A forward value learning process ensures local optimality and updates the policy via forward simulations. We consider a class of scalable and continuous-space rover navigation problems (RNP) to assess the safety, scalability, and optimality of the proposed algorithm. The results demonstrate the capabilities of the proposed algorithm in evaluating long-range risk/safety of the planner while addressing continuous problems with long planning horizons.

READ FULL TEXT
research
10/01/2018

Bayesian Policy Optimization for Model Uncertainty

Addressing uncertainty is critical for autonomous systems to robustly ad...
research
07/24/2022

Towards Using Fully Observable Policies for POMDPs

Partially Observable Markov Decision Process (POMDP) is a framework appl...
research
05/29/2019

LeTS-Drive: Driving in a Crowd by Learning from Tree Search

Autonomous driving in a crowded environment, e.g., a busy traffic inters...
research
03/03/2019

State-Continuity Approximation of Markov Decision Processes via Finite Element Analysis for Autonomous System Planning

Motion planning under uncertainty for an autonomous system can be formul...
research
12/19/2019

Uncertainty-sensitive Learning and Planning with Ensembles

We propose a reinforcement learning framework for discrete environments ...
research
01/10/2013

A Tractable POMDP for a Class of Sequencing Problems

We consider a partially observable Markov decision problem (POMDP) that ...
research
08/21/2021

Sequential Stochastic Optimization in Separable Learning Environments

We consider a class of sequential decision-making problems under uncerta...

Please sign up or login with your details

Forgot password? Click here to reset