Closing the Learning-Planning Loop with Predictive State Representations

12/12/2009
by   Byron Boots, et al.
0

A central problem in artificial intelligence is that of planning to maximize future reward under uncertainty in a partially observable environment. In this paper we propose and demonstrate a novel algorithm which accurately learns a model of such an environment directly from sequences of action-observation pairs. We then close the loop from observations to actions by planning in the learned model and recovering a policy which is near-optimal in the original environment. Specifically, we present an efficient and statistically consistent spectral algorithm for learning the parameters of a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then perform approximate point-based planning in the learned PSR. Analysis of our results shows that the algorithm learns a state space which efficiently captures the essential features of the environment. This representation allows accurate prediction with a small number of parameters, and enables successful and efficient planning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2019

Combining Offline Models and Online Monte-Carlo Tree Search for Planning from Scratch

Planning in stochastic and partially observable environments is a centra...
research
01/12/2019

Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories

We present a novel predictive model architecture based on the principles...
research
11/15/2018

Neural Predictive Belief Representations

Unsupervised representation learning has succeeded with excellent result...
research
12/30/2022

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

We study the task of learning state representations from potentially hig...
research
09/30/2019

Learning Compact Models for Planning with Exogenous Processes

We address the problem of approximate model minimization for MDPs in whi...
research
06/11/2019

Online Learning and Planning in Partially Observable Domains without Prior Knowledge

How an agent can act optimally in stochastic, partially observable domai...
research
09/20/2020

Latent Representation Prediction Networks

Deeply-learned planning methods are often based on learning representati...

Please sign up or login with your details

Forgot password? Click here to reset