Batch size-invariance for policy optimization

10/01/2021
by   Jacob Hilton, et al.
21

We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this property at small batch sizes, via the learning rate. However, some policy optimization algorithms (such as PPO) do not have this property, because of how they control the size of policy updates. In this work we show how to make these algorithms batch size-invariant. Our key insight is to decouple the proximal policy (used for controlling policy updates) from the behavior policy (used for off-policy corrections). Our experiments help explain why these algorithms work, and additionally show how they can make more efficient use of stale data.

READ FULL TEXT

page 16

page 17

page 18

page 19

page 21

page 22

page 23

page 27

research
12/15/2016

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become ...
research
07/09/2019

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

Increasing the batch size is a popular way to speed up neural network tr...
research
05/21/2019

Mathematical method for calculating batch fragmentations and their impacts on product recall within a FIFO assignment policy

This study explores the interactions between order sizes, batch sizes an...
research
02/26/2020

Stagewise Enlargement of Batch Size for SGD-based Learning

Existing research shows that the batch size can seriously affect the per...
research
04/06/2021

On the Optimality of Batch Policy Optimization Algorithms

Batch policy optimization considers leveraging existing data for policy ...
research
01/17/2018

An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients

In this technical report, we consider an approach that combines the PPO ...
research
07/25/2023

How to Scale Your EMA

Preserving training dynamics across batch sizes is an important tool for...

Please sign up or login with your details

Forgot password? Click here to reset