A Nonparametric Offpolicy Policy Gradient

01/08/2020
by   Samuele Tosatto, et al.
0

Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient algorithms that perform updates using on-policy samples. The price of such inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited. We address this issue by building on the general sample efficiency of off-policy algorithms. With nonparametric regression and density estimation methods we construct a nonparametric Bellman equation in a principled manner, which allows us to obtain closed-form estimates of the value function, and to analytically express the full policy gradient. We provide a theoretical analysis of our estimate to show that it is consistent under mild smoothness assumptions and empirically show that our approach has better sample efficiency than state-of-the-art policy gradient methods.

READ FULL TEXT
research
10/27/2020

Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Off-policy Reinforcement Learning (RL) holds the promise of better data ...
research
06/01/2017

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Off-policy model-free deep reinforcement learning methods using previous...
research
07/23/2021

A general sample complexity analysis of vanilla policy gradient

The policy gradient (PG) is one of the most popular methods for solving ...
research
04/21/2023

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

We consider the problem of control in the setting of reinforcement learn...
research
10/15/2017

Manifold Regularization for Kernelized LSTD

Policy evaluation or value function or Q-function approximation is a key...
research
02/23/2021

Mixed Policy Gradient

Reinforcement learning (RL) has great potential in sequential decision-m...
research
12/14/2020

Policy Gradient RL Algorithms as Directed Acyclic Graphs

Meta Reinforcement Learning (RL) methods focus on automating the design ...

Please sign up or login with your details

Forgot password? Click here to reset