Statistically Efficient Off-Policy Policy Gradients

02/10/2020
by   Nathan Kallus, et al.
26

Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value. In this paper, we consider the statistically efficient estimation of policy gradients from off-policy data, where the estimation is particularly non-trivial. We derive the asymptotic lower bound on the feasible mean-squared error in both Markov and non-Markov decision processes and show that existing estimators fail to achieve it in general settings. We propose a meta-algorithm that achieves the lower bound without any parametric assumptions and exhibits a unique 3-way double robustness property. We discuss how to estimate nuisances that the algorithm relies on. Finally, we establish guarantees on the rate at which we approach a stationary point when we take steps in the direction of our new estimated policy gradient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2021

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis

We propose policy-gradient algorithms for solving the problem of control...
research
10/16/2020

Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

Reinforcement learning considers the problem of finding policies that ma...
research
02/12/2020

Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcemen...
research
04/11/2023

A Tale of Sampling and Estimation in Discounted Reinforcement Learning

The most relevant problems in discounted reinforcement learning involve ...
research
01/31/2022

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Policy gradient (PG) estimation becomes a challenge when we are not allo...
research
02/19/2018

Fourier Policy Gradients

We propose a new way of deriving policy gradient updates for reinforceme...
research
05/17/2022

Robust Losses for Learning Value Functions

Most value function learning algorithms in reinforcement learning are ba...

Please sign up or login with your details

Forgot password? Click here to reset