Dual Sequential Monte Carlo: Tunneling Filtering and Planning in Continuous POMDPs

by   Yunbo Wang, et al.

We present the DualSMC network that solves continuous POMDPs by learning belief representations and then leveraging them for planning. It is based on the fact that filtering, i.e. state estimation, and planning can be viewed as two related sequential Monte Carlo processes, with one in the belief space and the other in the future planning trajectory space. In particular, we first introduce a novel particle filter network that makes better use of the adversarial relationship between the proposer model and the observation model. We then introduce a new planning algorithm over the belief representations, which learns uncertainty-dependent policies. We allow these two parts to be trained jointly with each other. We testify the effectiveness of our approach on three continuous control and planning tasks: the floor positioning, the 3D light-dark navigation, and a modified Reacher task.


page 1

page 2

page 3

page 4


Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess

This paper introduces deep synoptic Monte Carlo planning (DSMCP) for lar...

An On-Line POMDP Solver for Continuous Observation Spaces

Planning under partial obervability is essential for autonomous robots. ...

Approximate Planning for Factored POMDPs using Belief State Simplification

We are interested in the problem of planning for factored POMDPs. Buildi...

Monte Carlo Information-Oriented Planning

In this article, we discuss how to solve information-gathering problems ...

SACBP: Belief Space Planning for Continuous-Time Dynamical Systems via Stochastic Sequential Action Control

We propose a novel belief space planning technique for continuous dynami...

involve-MI: Informative Planning with High-Dimensional Non-Parametric Beliefs

One of the most complex tasks of decision making and planning is to gath...

Bootstrapped model learning and error correction for planning with uncertainty in model-based RL

Having access to a forward model enables the use of planning algorithms ...