Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

10/27/2020
by   Samuele Tosatto, et al.
15

Off-policy Reinforcement Learning (RL) holds the promise of better data efficiency as it allows sample reuse and potentially enables safe interaction with the environment. Current off-policy policy gradient methods either suffer from high bias or high variance, delivering often unreliable estimates. The price of inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited, and a very high sample cost hinders straightforward application. In this paper, we propose a nonparametric Bellman equation, which can be solved in closed form. The solution is differentiable w.r.t the policy parameters and gives access to an estimation of the policy gradient. In this way, we avoid the high variance of importance sampling approaches, and the high bias of semi-gradient methods. We empirically analyze the quality of our gradient estimate against state-of-the-art methods, and show that it outperforms the baselines in terms of sample efficiency on classical control tasks.

READ FULL TEXT

page 6

page 7

page 9

page 12

research
01/08/2020

A Nonparametric Offpolicy Policy Gradient

Reinforcement learning (RL) algorithms still suffer from high sample com...
research
01/17/2013

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement le...
research
04/28/2020

Improving Sample Efficiency and Multi-Agent Communication in RL-based Train Rescheduling

We present preliminary results from our sixth placed entry to the Flatla...
research
05/07/2019

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning

In importance sampling (IS)-based reinforcement learning algorithms such...
research
02/21/2020

Accelerating Reinforcement Learning with a Directional-Gaussian-Smoothing Evolution Strategy

Evolution strategy (ES) has been shown great promise in many challenging...
research
11/12/2020

Steady State Analysis of Episodic Reinforcement Learning

This paper proves that the episodic learning environment of every finite...
research
03/16/2023

Enabling First-Order Gradient-Based Learning for Equilibrium Computation in Markets

Understanding and analyzing markets is crucial, yet analytical equilibri...

Please sign up or login with your details

Forgot password? Click here to reset