-
Understanding the Role of Momentum in Stochastic Gradient Methods
The use of momentum in stochastic gradient methods has become a widespre...
read it
-
OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models
We present OpenSeq2Seq -- an open-source toolkit for training sequence-t...
read it
-
Novel Prediction Techniques Based on Clusterwise Linear Regression
In this paper we explore different regression models based on Clusterwis...
read it
-
Convergence Analysis of Gradient Descent Algorithms with Proportional Updates
The rise of deep learning in recent years has brought with it increasing...
read it
-
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Batch normalization (BN) has become a de facto standard for training dee...
read it
-
Large Batch Training of Convolutional Networks
A common way to speed up training of large convolutional networks is to ...
read it

Igor Gitman
is this you? claim profile