Dive into Big Model Training

07/25/2022
by   Qinghua Liu, et al.
0

The increasing scale of model size and continuous improvement of performance herald the arrival of the Big Model era. In this report, we explore what and how the big model training works by diving into training objectives and training methodologies. Specifically,training objectives describe how to leverage web-scale data to develop extremely capable and incredibly large models based on self-supervised learning, and training methodologies which are based on distributed training describe how to make big model training a reality. We summarize the existing training methodologies into three main categories: training parallelism, memory-saving technologies, and model sparsity design. Training parallelism can be categorized into data, pipeline, and tensor parallelism according to the dimension of parallelism that takes place. Memory-saving technologies are orthogonal and complementary to training parallelism. And model sparsity design further scales up the model size with a constant computational cost. A continuously updated paper list of big model training is provided at https://github.com/qhliu26/BM-Training.

READ FULL TEXT
research
11/10/2021

Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training

With deep learning models rapidly growing in size, systems-level solutio...
research
04/09/2021

Efficient Large-Scale Language Model Training on GPU Clusters

Large language models have led to state-of-the-art accuracies across a r...
research
02/06/2023

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

In recent years, large-scale models have demonstrated state-of-the-art p...
research
05/30/2021

2.5-dimensional distributed model training

Data parallelism does a good job in speeding up the training. However, w...
research
04/21/2020

torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models

We design and implement a ready-to-use library in PyTorch for performing...
research
05/25/2023

Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training

Deep learning is experiencing a rise in foundation models that are expec...
research
11/10/2014

Model-Parallel Inference for Big Topic Models

In real world industrial applications of topic modeling, the ability to ...

Please sign up or login with your details

Forgot password? Click here to reset