Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

04/14/2023
by   Yiqun Yao, et al.
0

As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that directly predicts some metrics for large models solely based on the results and hyperparameters from small models. Existing methods based on scaling laws require hyperparameter search on the largest models, which is impractical with limited resources. We address this issue by presenting our discoveries indicating that Maximal Update parametrization (muP) enables accurate fitting of scaling laws for hyperparameters close to common loss basins, without any search. Thus, different models can be directly compared on large scales with loss prediction even before the training starts. We propose a new paradigm as a first step towards reliable academic research for any model scale without heavy computation. Code will be publicly available shortly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2023

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

We study recent research advances that improve large language models thr...
research
06/13/2023

Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training

Hyperparameter tuning of deep learning models can lead to order-of-magni...
research
02/13/2022

Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

Neural scaling laws define a predictable relationship between a model's ...
research
09/20/2023

The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

The Languini Kitchen serves as both a research collective and codebase d...
research
11/12/2019

FLO: Fast and Lightweight Hyperparameter Optimization for AutoML

Integrating ML models in software is of growing interest. Building accur...
research
02/01/2023

Deep Power Laws for Hyperparameter Optimization

Hyperparameter optimization is an important subfield of machine learning...

Please sign up or login with your details

Forgot password? Click here to reset