A Theory on Adam Instability in Large-Scale Machine Learning

04/19/2023
by   Igor Molybog, et al.
0

We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent on the training loss landscape, leading to divergence. This artifact is more likely to be observed in the training of a deep model with a large batch size, which is the typical setting of large-scale language model training. To argue the theory, we present observations from the training runs of the language models of different scales: 7 billion, 30 billion, 65 billion, and 546 billion parameters.

READ FULL TEXT
research
04/28/2020

Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training

In data-parallel synchronous training of deep neural networks, different...
research
12/16/2020

Data optimization for large batch distributed training of deep neural networks

Distributed training in deep learning (DL) is common practice as data an...
research
04/25/2023

Stable and low-precision training for large-scale vision-language models

We introduce new methods for 1) accelerating and 2) stabilizing training...
research
04/13/2021

Large-Scale Contextualised Language Modelling for Norwegian

We present the ongoing NorLM initiative to support the creation and use ...
research
03/21/2023

Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning

Contrastive vision-language models (e.g. CLIP) are typically created by ...
research
08/16/2023

FootGPT : A Large Language Model Development Experiment on a Minimal Setting

With recent empirical observations, it has been argued that the most sig...
research
12/17/2019

Analyzing Privacy Loss in Updates of Natural Language Models

To continuously improve quality and reflect changes in data, machine lea...

Please sign up or login with your details

Forgot password? Click here to reset