On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

06/02/2021
by   Jiawei Huang, et al.
0

In this paper, we study the convergence properties of off-policy policy improvement algorithms with state-action density ratio correction under function approximation setting, where the objective function is formulated as a max-max-min optimization problem. We characterize the bias of the learning objective and present two strategies with finite-time convergence guarantees. In our first strategy, we present algorithm P-SREDA with convergence rate O(ϵ^-3), whose dependency on ϵ is optimal. In our second strategy, we propose a new off-policy actor-critic style algorithm named O-SPIM. We prove that O-SPIM converges to a stationary point with total complexity O(ϵ^-4), which matches the convergence rate of some recent actor-critic algorithms in the on-policy setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

As an important type of reinforcement learning algorithms, actor-critic ...
research
08/19/2021

Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning

Actor-critic algorithms are widely used in reinforcement learning, but a...
research
05/26/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

In this paper, we develop a novel variant of off-policy natural actor-cr...
research
02/23/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Designing off-policy reinforcement learning algorithms is typically a ve...
research
07/10/2020

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

This paper analyzes a two-timescale stochastic algorithm for a class of ...
research
07/19/2022

Actor-Critic based Improper Reinforcement Learning

We consider an improper reinforcement learning setting where a learner i...
research
10/29/2021

Understanding the Effect of Stochasticity in Policy Optimization

We study the effect of stochasticity in on-policy policy optimization, a...

Please sign up or login with your details

Forgot password? Click here to reset