
DouZero: Mastering DouDizhu with SelfPlay Deep Reinforcement Learning
Games are abstractions of the real world, where artificial agents learn ...
read it

1bit Adam: Communication Efficient LargeScale Training with Adam's Convergence Speed
Scalable training of large models (like BERT and GPT3) requires careful...
read it

APMSqueeze: A Communication Efficient AdamPreconditioned Momentum SGD Algorithm
Adam is the important optimization algorithm to guarantee efficiency and...
read it

Stochastic Recursive Momentum for Policy Gradient Methods
In this paper, we propose a novel algorithm named STOchastic Recursive M...
read it

Stochastic Recursive Variance Reduction for Efficient Smooth NonConvex Compositional Optimization
Stochastic compositional optimization arises in many important machine l...
read it

DeepSqueeze: Decentralization Meets ErrorCompensated Compression
Communication is a key bottleneck in distributed training. Recently, an ...
read it

DeepSqueeze: Parallel Stochastic Gradient Descent with DoublePass ErrorCompensated Compression
Communication is a key bottleneck in distributed training. Recently, an ...
read it

DoubleSqueeze: Parallel Stochastic Gradient Descent with DoublePass ErrorCompensated Compression
A standard approach in large scale machine learning is distributed stoch...
read it

Revisit Batch Normalization: New Understanding from an Optimization View and a Refinement via Composition Optimization
Batch Normalization (BN) has been used extensively in deep learning to a...
read it

D^2: Decentralized Training over Decentralized Data
While training a machine learning model using multiple workers, each of ...
read it

Asynchronous Decentralized Parallel Stochastic Gradient Descent
Recent work shows that decentralized parallel stochastic gradient decent...
read it

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Most distributed machine learning systems nowadays, including TensorFlow...
read it

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Asynchronous parallel implementations of stochastic gradient (SG) have b...
read it
Xiangru Lian
is this you? claim profile