Diagonal Preconditioning: Theory and Algorithms

by   Zhaonan Qu, et al.

Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with O(log(1/ϵ)) iteration complexity, where each iteration consists of an SDP feasibility problem and a Newton update using the Nesterov-Todd direction, respectively. Finally, we extend the optimal diagonal preconditioning algorithm to an adaptive setting and compare its empirical performance at reducing the condition number and speeding up convergence for regression and classification problems with that of another adaptive preconditioning technique, namely batch normalization, that is essential in training machine learning models.


page 1

page 2

page 3

page 4


Optimal Diagonal Preconditioning: Theory and Practice

Preconditioning has been a staple technique in optimization and machine ...

A fast iterative algorithm for near-diagonal eigenvalue problems

We introduce a novel iterative eigenvalue algorithm for near-diagonal ma...

Experiment Study of Entropy Convergence of Ant Colony Optimization

Ant colony optimization (ACO) has been applied to the field of combinato...

On the rank of Z_2-matrices with free entries on the diagonal

For an n × n matrix M with entries in ℤ_2 denote by R(M) the minimal ran...

Stochastic diagonal estimation: probabilistic bounds and an improved algorithm

We study the problem of estimating the diagonal of an implicitly given m...

Randomized Block-Diagonal Preconditioning for Parallel Learning

We study preconditioned gradient-based optimization methods where the pr...

Graph-Theoretical Based Algorithms for Structural Optimization

Five new algorithms were proposed in order to optimize well conditioning...

Please sign up or login with your details

Forgot password? Click here to reset