A Novel, Scale-Invariant, Differentiable, Efficient, Scalable Regularizer
L_p-norm regularization schemes such as L_0, L_1, and L_2-norm regularization and L_p-norm-based regularization techniques such as weight decay and group LASSO compute a quantity which de pends on model weights considered in isolation from one another. This paper describes a novel regularizer which is not based on an L_p-norm. In contrast with L_p-norm-based regularization, this regularizer is concerned with the spatial arrangement of weights within a weight matrix. This regularizer is an additive term for the loss function and is differentiable, simple and fast to compute, scale-invariant, requires a trivial amount of additional memory, and can easily be parallelized. Empirically this method yields approximately a one order-of-magnitude improvement in the number of nonzero model parameters at a given level of accuracy.
READ FULL TEXT