Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

11/12/2019

∙

Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classical two-stage paradigm: first learn the environment dynamics and then plan accordingly. This approach, however, disconnects the two problems and can consequently lead to algorithms that are sample inefficient and time consuming. In this paper, we propose a novel algorithm that combines learning and planning together. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.

READ FULL TEXT

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Sign in with Google

Consider DeepAI Pro