Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

11/12/2019
by   Tianyu Li, et al.
0

Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classical two-stage paradigm: first learn the environment dynamics and then plan accordingly. This approach, however, disconnects the two problems and can consequently lead to algorithms that are sample inefficient and time consuming. In this paper, we propose a novel algorithm that combines learning and planning together. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.

READ FULL TEXT

page 7

page 8

research
01/23/2014

Replanning in Domains with Partial Information and Sensing Actions

Replanning via determinization is a recent, popular approach for online ...
research
01/10/2013

Planning by Prioritized Sweeping with Small Backups

Efficient planning plays a crucial role in model-based reinforcement lea...
research
09/30/2021

Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

Recent advances in reinforcement learning (RL) have led to a growing int...
research
01/15/2014

Learning Partially Observable Deterministic Action Models

We present exact algorithms for identifying deterministic-actions effect...
research
02/25/2016

Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observab...
research
12/01/2013

Efficient Learning and Planning with Compressed Predictive States

Predictive state representations (PSRs) offer an expressive framework fo...

Please sign up or login with your details

Forgot password? Click here to reset