Scaling laws for single-agent reinforcement learning

01/31/2023
by   Jacob Hilton, et al.
0

Recent work has shown that, in generative modeling, cross-entropy loss improves smoothly with model size and training compute, following a power law plus constant scaling law. One challenge in extending these results to reinforcement learning is that the main performance objective of interest, mean episode return, need not vary smoothly. To overcome this, we introduce *intrinsic performance*, a monotonic function of the return defined as the minimum compute required to achieve the given return across a family of models of different sizes. We find that, across a range of environments, intrinsic performance scales as a power law in model size and environment interactions. Consequently, as in generative modeling, the optimal model size scales as a power law in the training compute budget. Furthermore, we study how this relationship varies with the environment and with other properties of the training setup. In particular, using a toy MNIST-based environment, we show that varying the "horizon length" of the task mostly changes the coefficient but not the exponent of this relationship.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2020

Scaling Laws for Autoregressive Generative Modeling

We identify empirical scaling laws for the cross-entropy loss in four do...
research
06/11/2021

Scaling Laws for Acoustic Models

There is a recent trend in machine learning to increase model quality by...
research
11/15/2021

Scaling Law for Recommendation Models: Towards General-purpose User Representations

A recent trend shows that a general class of models, e.g., BERT, GPT-3, ...
research
02/04/2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture

In this work, we study the effect of varying the architecture and traini...
research
04/22/2020

A Neural Scaling Law from the Dimension of the Data Manifold

When data is plentiful, the loss achieved by well-trained neural network...
research
07/23/2019

Simulating an infinite mean waiting time

We consider a hybrid method to simulate the return time to the initial s...
research
10/25/2017

An information scaling law: ζ= 3/4

Consider the entropy of a unit Gaussian convolved over a discrete set of...

Please sign up or login with your details

Forgot password? Click here to reset