An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients

01/17/2018
by   Jiaming Song, et al.
0

In this technical report, we consider an approach that combines the PPO objective and K-FAC natural gradient optimization, for which we call PPOKFAC. We perform a range of empirical analysis on various aspects of the algorithm, such as sample complexity, training speed, and sensitivity to batch size and training epochs. We observe that PPOKFAC is able to outperform PPO in terms of sample complexity and speed in a range of MuJoCo environments, while being scalable in terms of batch size. In spite of this, it seems that adding more epochs is not necessarily helpful for sample efficiency, and PPOKFAC seems to be worse than its A2C counterpart, ACKTR.

READ FULL TEXT
research
04/27/2020

Improving Sample Complexity Bounds for Actor-Critic Algorithms

The actor-critic (AC) algorithm is a popular method to find an optimal p...
research
07/23/2021

A general sample complexity analysis of vanilla policy gradient

The policy gradient (PG) is one of the most popular methods for solving ...
research
05/12/2022

Sequential algorithms for testing identity and closeness of distributions

What advantage do sequential procedures provide over batch algorithms fo...
research
10/01/2021

Batch size-invariance for policy optimization

We say an algorithm is batch size-invariant if changes to the batch size...
research
02/03/2023

Sample Complexity of Probability Divergences under Group Symmetry

We rigorously quantify the improvement in the sample complexity of varia...
research
08/18/2023

Breaking the Complexity Barrier in Compositional Minimax Optimization

Compositional minimax optimization is a pivotal yet under-explored chall...
research
11/29/2022

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recogn...

Please sign up or login with your details

Forgot password? Click here to reset