Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

03/06/2023
by   Xiaonan Nie, et al.
8

Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transformer models. Angel-PTM can train extremely large-scale models with hierarchical memory efficiently. The key designs of Angel-PTM are the fine-grained memory management via the Page abstraction and a unified scheduling method that coordinate the computations, data movements, and communications. Furthermore, Angel-PTM supports extreme model scaling with SSD storage and implements the lock-free updating mechanism to address the SSD I/O bandwidth bottlenecks. Experimental results demonstrate that Angel-PTM outperforms existing systems by up to 114.8 well as up to 88.9 on GPT3-175B and T5-MoE-1.2T models utilizing hundreds of GPUs verify the strong scalability of Angel-PTM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2022

X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

Vision language pre-training aims to learn alignments between vision and...
research
03/09/2022

Inadequately Pre-trained Models are Better Feature Extractors

Pre-training has been a popular learning paradigm in deep learning era, ...
research
10/06/2022

XDoc: Unified Pre-training for Cross-Format Document Understanding

The surge of pre-training has witnessed the rapid development of documen...
research
02/18/2023

Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search

An effective ranking model usually requires a large amount of training d...
research
05/28/2021

Knowledge Inheritance for Pre-trained Language Models

Recent explorations of large-scale pre-trained language models (PLMs) su...
research
10/05/2019

Parallelizing Training of Deep Generative Models on Massive Scientific Datasets

Training deep neural networks on large scientific data is a challenging ...
research
04/13/2021

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

To train large models (like BERT and GPT-3) with hundreds or even thousa...

Please sign up or login with your details

Forgot password? Click here to reset