b'Minjia Zhang'

research

∙ 09/02/2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Text-to-image generation (TTI) refers to the usage of models that could ...

0 Fengxiang Bie, et al. ∙

research

∙ 08/11/2023

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

Continual learning (CL) trains NN models incrementally from a continuous...

0 Xinyue Ma, et al. ∙

research

∙ 08/02/2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

ChatGPT-like models have revolutionized various applications in artifici...

0 Zhewei Yao, et al. ∙

research

∙ 05/25/2023

FedHC: A Scalable Federated Learning Framework for Heterogeneous and Resource-Constrained Clients

Federated Learning (FL) is a distributed learning paradigm that empowers...

0 Min Zhang, et al. ∙

research

∙ 12/07/2022

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Recent advances on deep learning models come at the price of formidable ...

0 Conglong Li, et al. ∙

research

∙ 11/17/2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Large-scale transformer models have become the de-facto architectures fo...

0 Zhewei Yao, et al. ∙

research

∙ 06/30/2022

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

The past several years have witnessed the success of transformer-based m...

6 Reza Yazdani Aminabadi, et al. ∙

research

∙ 06/04/2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

How to efficiently serve ever-larger trained natural language models in ...

0 Zhewei Yao, et al. ∙

research

∙ 06/04/2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Extreme compression, particularly ultra-low bit precision (binary/ternar...

0 Xiaoxia Wu, et al. ∙

research

∙ 04/26/2022

Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

DNN models across many domains continue to grow in size, resulting in hi...

0 John Thorpe, et al. ∙

research

∙ 03/17/2022

A Survey of Multi-Tenant Deep Learning Inference on GPU

Deep Learning (DL) models have achieved superior performance. Meanwhile,...

0 Fuxun Yu, et al. ∙

research

∙ 01/31/2022

Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism

Nearest Neighbor Search (NNS) has recently drawn a rapid increase of int...

0 Zhen Peng, et al. ∙

research

∙ 01/29/2022

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

In recent years, large pre-trained Transformer-based language models hav...

0 Minjia Zhang, et al. ∙

research

∙ 01/14/2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

As the training of giant dense models hits the boundary on the availabil...

8 Samyam Rajbhandari, et al. ∙

research

∙ 11/28/2021

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities

Deep Learning (DL) models have achieved superior performance in many app...

0 Fuxun Yu, et al. ∙

research

∙ 10/14/2021

Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning

Continual Learning (CL) is an emerging machine learning paradigm that ai...

41 Soobee Lee, et al. ∙

research

∙ 08/13/2021

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training

Recent works have demonstrated great success in training high-capacity a...

0 Conglong Li, et al. ∙

research

∙ 07/27/2021

Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search

Graph-based algorithms have shown great empirical potential for the appr...

0 Dantong Zhu, et al. ∙

research

∙ 01/18/2021

ZeRO-Offload: Democratizing Billion-Scale Model Training

Large-scale model training has been a playing ground for a limited few r...

0 Jie Ren, et al. ∙

research

∙ 10/26/2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Recently, Transformer-based language models have demonstrated remarkable...

0 Minjia Zhang, et al. ∙

research

∙ 11/04/2019

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory

The effectiveness of LSTM neural networks for popular tasks such as Auto...

76 Reza Yazdani, et al. ∙

research

∙ 09/11/2019

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning

Software-managed heterogeneous memory (HM) provides a promising solution...

0 Jie Ren, et al. ∙

research

∙ 09/11/2018

Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory

With the advancement of machine learning and deep learning, vector searc...

0 Minjia Zhang, et al. ∙

research

∙ 06/11/2018

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

Neural language models (NLMs) have recently gained a renewed interest by...

0 Minjia Zhang, et al. ∙

Minjia Zhang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro