b'Yuxiong He'

research

∙ 09/02/2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Text-to-image generation (TTI) refers to the usage of models that could ...

0 Fengxiang Bie, et al. ∙

research

∙ 08/02/2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

ChatGPT-like models have revolutionized various applications in artifici...

0 Zhewei Yao, et al. ∙

research

∙ 07/19/2023

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

In the complex domain of large language models (LLMs), striking a balanc...

0 Xiaoxia Wu, et al. ∙

research

∙ 06/16/2023

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of ...

0 Guanhua Wang, et al. ∙

research

∙ 05/16/2023

Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?

This study examines the impact of optimizing the Stable Diffusion (SD) g...

0 Pareesa Ameneh Golnari, et al. ∙

research

∙ 04/14/2023

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

Collaborative filtering (CF) has been proven to be one of the most effec...

0 Chengming Zhang, et al. ∙

research

∙ 03/15/2023

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

In recent years, the training requirements of many state-of-the-art Deep...

0 Quentin Anthony, et al. ∙

research

∙ 03/15/2023

A Comprehensive Study on Post-Training Quantization for Large Language Models

Post-training quantization () had been recently shown as a compromising ...

0 Zhewei Yao, et al. ∙

research

∙ 03/13/2023

Scaling Vision-Language Models with Sparse Mixture of Experts

The field of natural language processing (NLP) has made significant stri...

0 Sheng Shen, et al. ∙

research

∙ 03/11/2023

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

Mixture-of-Experts (MoE) is a neural network architecture that adds spar...

0 Siddharth Singh, et al. ∙

research

∙ 01/27/2023

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Improving the deployment efficiency of transformer-based language models...

0 Xiaoxia Wu, et al. ∙

research

∙ 12/07/2022

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Recent advances on deep learning models come at the price of formidable ...

0 Conglong Li, et al. ∙

research

∙ 11/17/2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Large-scale transformer models have become the de-facto architectures fo...

0 Zhewei Yao, et al. ∙

research

∙ 07/29/2022

BiFeat: Supercharge GNN Training via Graph Feature Quantization

Graph Neural Networks (GNNs) is a promising approach for applications wi...

0 Yuxin Ma, et al. ∙

research

∙ 06/30/2022

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

The past several years have witnessed the success of transformer-based m...

6 Reza Yazdani Aminabadi, et al. ∙

research

∙ 06/04/2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

How to efficiently serve ever-larger trained natural language models in ...

0 Zhewei Yao, et al. ∙

research

∙ 06/04/2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Extreme compression, particularly ultra-low bit precision (binary/ternar...

0 Xiaoxia Wu, et al. ∙

research

∙ 01/29/2022

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

In recent years, large pre-trained Transformer-based language models hav...

0 Minjia Zhang, et al. ∙

research

∙ 01/28/2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Pretrained general-purpose language models can achieve state-of-the-art ...

8 Shaden Smith, et al. ∙

research

∙ 01/14/2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

As the training of giant dense models hits the boundary on the availabil...

8 Samyam Rajbhandari, et al. ∙

research

∙ 09/22/2021

Scalable and Efficient MoE Training for Multitask Multilingual Models

The Mixture of Experts (MoE) models are an emerging class of sparsely ac...

0 Young Jin Kim, et al. ∙

research

∙ 08/13/2021

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training

Recent works have demonstrated great success in training high-capacity a...

0 Conglong Li, et al. ∙

research

∙ 04/16/2021

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

In the last three years, the largest dense deep learning models have gro...

68 Samyam Rajbhandari, et al. ∙

research

∙ 04/13/2021

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

To train large models (like BERT and GPT-3) with hundreds or even thousa...

0 Conglong Li, et al. ∙

research

∙ 02/04/2021

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

Scalable training of large models (like BERT and GPT-3) requires careful...

0 Hanlin Tang, et al. ∙

research

∙ 01/18/2021

ZeRO-Offload: Democratizing Billion-Scale Model Training

Large-scale model training has been a playing ground for a limited few r...

0 Jie Ren, et al. ∙

research

∙ 10/26/2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Recently, Transformer-based language models have demonstrated remarkable...

0 Minjia Zhang, et al. ∙

research

∙ 08/26/2020

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Adam is the important optimization algorithm to guarantee efficiency and...

11 Hanlin Tang, et al. ∙

research

∙ 11/04/2019

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory

The effectiveness of LSTM neural networks for popular tasks such as Auto...

76 Reza Yazdani, et al. ∙

research

∙ 10/04/2019

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models

Training large DL models with billions and potentially trillions of para...

1 Samyam Rajbhandari, et al. ∙

research

∙ 10/02/2019

AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

Wide adoption of complex RNN based models is hindered by their inference...

0 Samyam Rajbhandari, et al. ∙

research

∙ 09/11/2018

Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory

With the advancement of machine learning and deep learning, vector searc...

0 Minjia Zhang, et al. ∙

research

∙ 06/11/2018

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

Neural language models (NLMs) have recently gained a renewed interest by...

0 Minjia Zhang, et al. ∙

research

∙ 09/15/2017

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Model compression is significant for the wide adoption of Recurrent Neur...

0 Wei Wen, et al. ∙

Yuxiong He

Featured Co-authors

Sign in with Google

Consider DeepAI Pro