Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy). As the optimal policy can be very different from the behavior policy, learning optimal behavior is very hard in the "off-policy" setting compared to the "on-policy" setting where new data from the policy updates will be utilized in learning. This work proposes an off-policy natural actor-critic algorithm that utilizes state-action distribution correction for handling the off-policy behavior and the natural policy gradient for sample efficiency. The existing natural gradient-based actor-critic algorithms with convergence guarantees require fixed features for approximating both policy and value functions. This often leads to sub-optimal learning in many RL applications. On the other hand, our proposed algorithm utilizes compatible features that enable one to use arbitrary neural networks to approximate the policy and the value function and guarantee convergence to a locally optimal policy. We illustrate the benefit of the proposed off-policy natural gradient algorithm by comparing it with the vanilla gradient actor-critic algorithm on benchmark RL tasks.

READ FULL TEXT
research
03/06/2017

Revisiting stochastic off-policy action-value gradients

Off-policy stochastic actor-critic methods rely on approximating the sto...
research
08/03/2021

Variational Actor-Critic Algorithms

We introduce a class of variational actor-critic algorithms based on a v...
research
07/17/2023

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

We study robust reinforcement learning (RL) with the goal of determining...
research
11/05/2021

An Algorithmic Theory of Metacognition in Minds and Machines

Humans sometimes choose actions that they themselves can identify as sub...
research
12/23/2019

Direct and indirect reinforcement learning

Reinforcement learning (RL) algorithms have been successfully applied to...
research
05/29/2020

Reinforcement Learning

Reinforcement learning (RL) is a general framework for adaptive control,...
research
04/09/2021

Learning Sampling Policy for Faster Derivative Free Optimization

Zeroth-order (ZO, also known as derivative-free) methods, which estimate...

Please sign up or login with your details

Forgot password? Click here to reset