Data-Efficient Reinforcement Learning in Continuous-State POMDPs

02/08/2016
by   Rowan McAllister, et al.
0

We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distribution of possible system trajectories. We additionally predict trajectories w.r.t. a filtering process, achieving significantly higher performance than combining a filter with a policy optimised by the original (unfiltered) framework. Our test setup is the cartpole swing-up task with sensor noise, which involves nonlinear dynamics and requires nonlinear control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2021

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

In applications of offline reinforcement learning to observational data,...
research
10/14/2021

Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

Reliable AI agents should be mindful of the limits of their knowledge an...
research
08/22/2019

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Off-policy evaluation (OPE) in reinforcement learning allows one to eval...
research
02/25/2016

Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observab...
research
05/07/2017

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observab...
research
08/07/2020

SafePILCO: a software tool for safe and data-efficient policy synthesis

SafePILCO is a software tool for safe and data-efficient policy search w...
research
06/13/2012

Improving Gradient Estimation by Incorporating Sensor Data

An efficient policy search algorithm should estimate the local gradient ...

Please sign up or login with your details

Forgot password? Click here to reset