Text-to-image generation (TTI) refers to the usage of models that could
...
ChatGPT-like models have revolutionized various applications in artifici...
In the complex domain of large language models (LLMs), striking a balanc...
Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of ...
This study examines the impact of optimizing the Stable Diffusion (SD) g...
Collaborative filtering (CF) has been proven to be one of the most effec...
In recent years, the training requirements of many state-of-the-art Deep...
Post-training quantization () had been recently shown as a compromising
...
The field of natural language processing (NLP) has made significant stri...
Mixture-of-Experts (MoE) is a neural network architecture that adds spar...
Improving the deployment efficiency of transformer-based language models...
Recent advances on deep learning models come at the price of formidable
...
Large-scale transformer models have become the de-facto architectures fo...
Graph Neural Networks (GNNs) is a promising approach for applications wi...
The past several years have witnessed the success of transformer-based
m...
How to efficiently serve ever-larger trained natural language models in
...
Extreme compression, particularly ultra-low bit precision (binary/ternar...
In recent years, large pre-trained Transformer-based language models hav...
Pretrained general-purpose language models can achieve state-of-the-art
...
As the training of giant dense models hits the boundary on the availabil...
The Mixture of Experts (MoE) models are an emerging class of sparsely
ac...
Recent works have demonstrated great success in training high-capacity
a...
In the last three years, the largest dense deep learning models have gro...
To train large models (like BERT and GPT-3) with hundreds or even thousa...
Scalable training of large models (like BERT and GPT-3) requires careful...
Large-scale model training has been a playing ground for a limited few
r...
Recently, Transformer-based language models have demonstrated remarkable...
Adam is the important optimization algorithm to guarantee efficiency and...
The effectiveness of LSTM neural networks for popular tasks such as Auto...
Training large DL models with billions and potentially trillions of
para...
Wide adoption of complex RNN based models is hindered by their inference...
With the advancement of machine learning and deep learning, vector searc...
Neural language models (NLMs) have recently gained a renewed interest by...
Model compression is significant for the wide adoption of Recurrent Neur...