Data-Efficient Reinforcement Learning in Continuous-State POMDPs

02/08/2016
by   Rowan McAllister, et al.
0

We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distribution of possible system trajectories. We additionally predict trajectories w.r.t. a filtering process, achieving significantly higher performance than combining a filter with a policy optimised by the original (unfiltered) framework. Our test setup is the cartpole swing-up task with sensor noise, which involves nonlinear dynamics and requires nonlinear control.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/28/2021

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

In applications of offline reinforcement learning to observational data,...
10/14/2021

Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

Reliable AI agents should be mindful of the limits of their knowledge an...
08/22/2019

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Off-policy evaluation (OPE) in reinforcement learning allows one to eval...
02/25/2016

Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observab...
05/07/2017

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observab...
10/24/2021

Off-Policy Evaluation in Partially Observed Markov Decision Processes

We consider off-policy evaluation of dynamic treatment rules under the a...
03/16/2017

Scalable Accelerated Decentralized Multi-Robot Policy Search in Continuous Observation Spaces

This paper presents the first ever approach for solving continuous-obser...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.