
-
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Scalable training of large models (like BERT and GPT-3) requires careful...
read it
-
ZeRO-Offload: Democratizing Billion-Scale Model Training
Large-scale model training has been a playing ground for a limited few r...
read it
-
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Recently, Transformer-based language models have demonstrated remarkable...
read it
-
APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Adam is the important optimization algorithm to guarantee efficiency and...
read it
-
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
The effectiveness of LSTM neural networks for popular tasks such as Auto...
read it
-
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Training large DL models with billions and potentially trillions of para...
read it
-
AntMan: Sparse Low-Rank Compression to Accelerate RNN inference
Wide adoption of complex RNN based models is hindered by their inference...
read it
-
Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory
With the advancement of machine learning and deep learning, vector searc...
read it
-
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models
Neural language models (NLMs) have recently gained a renewed interest by...
read it
-
Learning Intrinsic Sparse Structures within Long Short-Term Memory
Model compression is significant for the wide adoption of Recurrent Neur...
read it