Block Mean Approximation for Efficient Second Order Optimization

04/16/2018
by   Yao Lu, et al.
0

Advanced optimization algorithms such as Newton method and AdaGrad benefit from second order derivative or second order statistics to achieve better descent directions and faster convergence rates. At their heart, such algorithms need to compute the inverse or inverse square root of a matrix whose size is quadratic of the dimensionality of the search space. For high dimensional search spaces, the matrix inversion or inversion of square root becomes overwhelming which in turn demands for approximate methods. In this work, we propose a new matrix approximation method which divides a matrix into blocks and represents each block by one or two numbers. The method allows efficient computation of matrix inverse and inverse square root. We apply our method to AdaGrad in training deep neural networks. Experiments show encouraging results compared to the diagonal approximation.

READ FULL TEXT
research
10/27/2022

RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

The second-order training methods can converge much faster than first-or...
research
11/21/2020

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Second-order optimization methods have the ability to accelerate converg...
research
08/04/2023

Eva: A General Vectorized Approximation Framework for Second-order Optimization

Second-order optimization algorithms exhibit excellent convergence prope...
research
12/03/2012

GLCM-based chi-square histogram distance for automatic detection of defects on patterned textures

Chi-square histogram distance is one of the distance measures that can b...
research
10/12/2017

Reduction of Look Up Tables for Computation of Reciprocal of Square Roots

Among many existing algorithms, convergence methods are the most popular...
research
01/29/2022

Fast Differentiable Matrix Square Root and Inverse Square Root

Computing the matrix square root and its inverse in a differentiable man...
research
06/08/2018

The Case for Full-Matrix Adaptive Regularization

Adaptive regularization methods come in diagonal and full-matrix variant...

Please sign up or login with your details

Forgot password? Click here to reset