A stochastic optimization approach to train non-linear neural networks with a higher-order variation regularization
While highly expressive parametric models including deep neural networks have an advantage to model complicated concepts, training such highly non-linear models is known to yield a high risk of notorious overfitting. To address this issue, this study considers a (k,q)th order variation regularization ((k,q)-VR), which is defined as the qth-powered integral of the absolute kth order derivative of the parametric models to be trained; penalizing the (k,q)-VR is expected to yield a smoother function, which is expected to avoid overfitting. Particularly, (k,q)-VR encompasses the conventional (general-order) total variation with q=1. While the (k,q)-VR terms applied to general parametric models are computationally intractable due to the integration, this study provides a stochastic optimization algorithm, that can efficiently train general models with the (k,q)-VR without conducting explicit numerical integration. The proposed approach can be applied to the training of even deep neural networks whose structure is arbitrary, as it can be implemented by only a simple stochastic gradient descent algorithm and automatic differentiation. Our numerical experiments demonstrate that the neural networks trained with the (k,q)-VR terms are more “resilient” than those with the conventional parameter regularization. The proposed algorithm also can be extended to the physics-informed training of neural networks (PINNs).
READ FULL TEXT