On the Performance of Preconditioned Stochastic Gradient Descent
This paper studies the performance of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity at the same time. We have improved the implementation of PSGD, unrevealed its relationship to equilibrated stochastic gradient descent (ESGD) and batch normalization, and provided a software package (https://github.com/lixilinx/psgd_tf) implemented in Tensorflow to compare variations of PSGD and stochastic gradient descent (SGD) on a wide range of benchmark problems with commonly used neural network models, e.g., convolutional and recurrent neural networks. Comparison results clearly demonstrate the advantages of PSGD in terms of convergence speeds and generalization performances.
READ FULL TEXT