Gradient compression (GC) is a promising approach to addressing the
comm...
Distributed training using multiple devices (e.g., GPUs) has been widely...
Communication overhead severely hinders the scalability of distributed
m...
The scalability of Distributed Stochastic Gradient Descent (SGD) is toda...
BERT has recently attracted a lot of attention in natural language
under...
Recently there has been a surge of research on improving the communicati...
While image classification models have recently continued to advance, mo...
Recent years have witnessed the growth of large-scale distributed machin...
We present GluonCV and GluonNLP, the deep learning toolkits for computer...
With an increasing demand for training powers for deep learning algorith...
Batching is an essential technique to improve computation efficiency in ...