RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

by   Yilong Zhao, et al.

The second-order training methods can converge much faster than first-order optimizers in DNN training. This is because the second-order training utilizes the inversion of the second-order information (SOI) matrix to find a more accurate descent direction and step size. However, the huge SOI matrices bring significant computational and memory overheads in the traditional architectures like GPU and CPU. On the other side, the ReRAM-based process-in-memory (PIM) technology is suitable for the second-order training because of the following three reasons: First, PIM's computation happens in memory, which reduces data movement overheads; Second, ReRAM crossbars can compute SOI's inversion in O(1) time; Third, if architected properly, ReRAM crossbars can perform matrix inversion and vector-matrix multiplications which are important to the second-order training algorithms. Nevertheless, current ReRAM-based PIM techniques still face a key challenge for accelerating the second-order training. The existing ReRAM-based matrix inversion circuitry can only support 8-bit accuracy matrix inversion and the computational precision is not sufficient for the second-order training that needs at least 16-bit accurate matrix inversion. In this work, we propose a method to achieve high-precision matrix inversion based on a proven 8-bit matrix inversion (INV) circuitry and vector-matrix multiplication (VMM) circuitry. We design , a ReRAM-based PIM accelerator architecture for the second-order training. Moreover, we propose a software mapping scheme for to further optimize the performance by fusing VMM and INV crossbar. Experiment shows that can achieve an average of 115.8×/11.4× speedup and 41.9×/12.8×energy saving compared to a GPU counterpart and PipeLayer on large-scale DNNs.


page 1

page 6

page 7


Block Mean Approximation for Efficient Second Order Optimization

Advanced optimization algorithms such as Newton method and AdaGrad benef...

KrADagrad: Kronecker Approximation-Domination Gradient Preconditioned Stochastic Optimization

Second order stochastic optimizers allow parameter update step size and ...

Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning

The second-order optimization methods, notably the D-KFAC (Distributed K...

Second Order Optimization Made Practical

Optimization in machine learning, both theoretical and applied, is prese...

Modified online Newton step based on element wise multiplication

The second order method as Newton Step is a suitable technique in Online...

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is prese...

Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible

Machine learning is predicated on the concept of generalization: a model...

Please sign up or login with your details

Forgot password? Click here to reset