Partially Observable Planning and Learning for Systems with Non-Uniform Dynamics

07/09/2019
by   Nicholas Collins, et al.
3

We propose a neural network architecture, called TransNet, that combines planning and model learning for solving Partially Observable Markov Decision Processes (POMDPs) with non-uniform system dynamics. The past decade has seen a substantial advancement in solving POMDP problems. However, constructing a suitable POMDP model remains difficult. Recently, neural network architectures have been proposed to alleviate the difficulty in acquiring such models. Although the results are promising, existing architectures restrict the type of system dynamics that can be learned --that is, system dynamics must be the same in all parts of the state space. TransNet relaxes such a restriction. Key to this relaxation is a novel neural network module that classifies the state space into classes and then learns the system dynamics of the different classes. TransNet uses this module together with the overall architecture of QMDP-Net[1] to allow solving POMDPs that have more expressive dynamic models, while maintaining efficient data requirement. Its evaluation on typical benchmarks in robot navigation with initially unknown system and environment models indicates that TransNet substantially out-performs the quality of the generated policies and learning efficiency of the state-of-the-art method QMDP-Net.

READ FULL TEXT

page 6

page 10

research
12/12/2012

Reinforcement Learning with Partially Known World Dynamics

Reinforcement learning would enjoy better success on real-world problems...
research
01/23/2013

A Possibilistic Model for Qualitative Sequential Decision Problems under Uncertainty in Partially Observable Environments

In this article we propose a qualitative (ordinal) counterpart for the P...
research
05/27/2022

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

Reward optimization in fully observable Markov decision processes is equ...
research
03/15/2012

RAPID: A Reachable Anytime Planner for Imprecisely-sensed Domains

Despite the intractability of generic optimal partially observable Marko...
research
01/18/2014

Proximity-Based Non-uniform Abstractions for Approximate Planning

In a deterministic world, a planning agent can be certain of the consequ...
research
05/29/2020

Non-Linearity Measure for POMDP-based Motion Planning

Motion planning under uncertainty is essential for reliable robot operat...
research
02/26/2020

Learning Navigation Costs from Demonstration in Partially Observable Environments

This paper focuses on inverse reinforcement learning (IRL) to enable saf...

Please sign up or login with your details

Forgot password? Click here to reset