The rapid growth of memory and computation requirements of large languag...
The low-rank adaptation (LoRA) method can largely reduce the amount of
t...
Second-order optimization algorithms exhibit excellent convergence prope...
Many recent practical and theoretical breakthroughs focus on adversarial...
To accelerate distributed training, many gradient compression methods ha...
Federated Learning (FL) enables collaborations among clients for train
m...
Communication scheduling has been shown to be effective in accelerating
...
To enable the pre-trained models to be fine-tuned with local data on edg...
Recent advanced studies have spent considerable human efforts on optimiz...
The second-order optimization methods, notably the D-KFAC (Distributed
K...
In federated learning (FL), model performance typically suffers from cli...
The ever-growing model size and scale of compute have attracted increasi...
Deep neural networks (DNNs) have achieved great success in the area of
c...
Distributed training with synchronous stochastic gradient descent (SGD) ...
The COVID-19 pandemic has spread globally for several months. Because it...
Distributed training techniques have been widely deployed in large-scale...
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely use...
In recent years, distributed deep learning techniques are widely deploye...
Deep neural networks (DNNs) have achieved great success in the area of
c...
Distributed deep learning becomes very common to reduce the overall trai...
Distributed Deep Learning (DDL) has rapidly grown its popularity since i...
Distributed learning techniques such as federated learning have enabled
...
Distributed synchronous stochastic gradient descent has been widely used...
Distributed stochastic gradient descent (SGD) algorithms are widely depl...
To reduce the long training time of large deep neural network (DNN) mode...
Skin disease is one of the most common types of human diseases, which ma...
Deep neural networks (DNNs) have become widely used in many AI applicati...
Distributed synchronous stochastic gradient descent (S-SGD) with data
pa...
Distributed synchronous stochastic gradient descent has been widely used...
Synchronized stochastic gradient descent (SGD) optimizers with data
para...
With huge amounts of training data, deep learning has made great
breakth...
Deep learning frameworks have been widely deployed on GPU servers for de...
With the success of deep learning techniques in a broad range of applica...