Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

02/13/2022
by   Maor Ivgi, et al.
0

Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law. However, most research to date has not explicitly investigated whether scaling laws can be used to accelerate model development. In this work, we perform such an empirical investigation across a wide range of language understanding tasks, starting from models with as few as 10K parameters, and evaluate downstream performance across 9 language understanding tasks. We find that scaling laws emerge at finetuning time in some NLP tasks, and that they can also be exploited for debugging convergence when training large models. Moreover, for tasks where scaling laws exist, they can be used to predict the performance of larger models, which enables effective model selection. However, revealing scaling laws requires careful hyperparameter tuning and multiple runs for the purpose of uncertainty estimation, which incurs additional overhead, partially offsetting the computational benefits.

READ FULL TEXT
research
02/02/2022

Unified Scaling Laws for Routed Language Models

The performance of a language model has been shown to be effectively mod...
research
08/17/2022

Understanding Scaling Laws for Recommendation Models

Scale has been a major driving force in improving machine learning perfo...
research
12/17/2020

Scaling laws for the rigid-body response of masonry structures under blast loads

The response of masonry structures to explosions can be hardly investiga...
research
02/01/2023

Deep Power Laws for Hyperparameter Optimization

Hyperparameter optimization is an important subfield of machine learning...
research
09/15/2023

Scaling Laws for Sparsely-Connected Foundation Models

We explore the impact of parameter sparsity on the scaling behavior of T...
research
01/06/2023

Myths and Legends in High-Performance Computing

In this humorous and thought provoking article, we discuss certain myths...
research
04/14/2023

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

As language models scale up, it becomes increasingly expensive to verify...

Please sign up or login with your details

Forgot password? Click here to reset