Is the Number of Trainable Parameters All That Actually Matters?

09/24/2021
by   Amélie Chatelain, et al.
0

Recent work has identified simple empirical scaling laws for language models, linking compute budget, dataset size, model size, and autoregressive modeling loss. The validity of these simple power laws across orders of magnitude in model scale provides compelling evidence that larger models are also more capable models. However, scaling up models under the constraints of hardware and infrastructure is no easy feat, and rapidly becomes a hard and expensive engineering problem. We investigate ways to tentatively cheat scaling laws, and train larger models for cheaper. We emulate an increase in effective parameters, using efficient approximations: either by doping the models with frozen random parameters, or by using fast structured transforms in place of dense linear layers. We find that the scaling relationship between test loss and compute depends only on the actual number of trainable parameters; scaling laws cannot be deceived by spurious parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2020

Scaling Laws for Neural Language Models

We study empirical scaling laws for language model performance on the cr...
research
07/05/2023

Scaling Laws Do Not Scale

Recent work has proposed a power law relationship, referred to as “scali...
research
04/07/2021

Scaling Scaling Laws with Board Games

The largest experiments in machine learning now require resources far be...
research
02/02/2022

Unified Scaling Laws for Routed Language Models

The performance of a language model has been shown to be effectively mod...
research
10/26/2022

Scaling Laws Beyond Backpropagation

Alternatives to backpropagation have long been studied to better underst...
research
07/18/2023

Scaling Laws for Imitation Learning in NetHack

Imitation Learning (IL) is one of the most widely used methods in machin...
research
12/28/2022

Cramming: Training a Language Model on a Single GPU in One Day

Recent trends in language modeling have focused on increasing performanc...

Please sign up or login with your details

Forgot password? Click here to reset