Persistency of Excitation for Robustness of Neural Networks

by   Kamil Nar, et al.

When an online learning algorithm is used to estimate the unknown parameters of a model, the signals interacting with the parameter estimates should not decay too quickly for the optimal values to be discovered correctly. This requirement is referred to as persistency of excitation, and it arises in various contexts, such as optimization with stochastic gradient methods, exploration for multi-armed bandits, and adaptive control of dynamical systems. While training a neural network, the iterative optimization algorithm involved also creates an online learning problem, and consequently, correct estimation of the optimal parameters requires persistent excitation of the network weights. In this work, we analyze the dynamics of the gradient descent algorithm while training a two-layer neural network with two different loss functions, the squared-error loss and the cross-entropy loss; and we obtain conditions to guarantee persistent excitation of the network weights. We then show that these conditions are difficult to satisfy when a multi-layer network is trained for a classification task, for the signals in the intermediate layers of the network become low-dimensional during training and fail to remain persistently exciting. To provide a remedy, we delve into the classical regularization terms used for linear models, reinterpret them as a means to ensure persistent excitation of the model parameters, and propose an algorithm for neural networks by building an analogy. The results in this work shed some light on why adversarial examples have become a challenging problem for neural networks, why merely augmenting training data sets will not be an effective approach to address them, and why there may not exist a data-independent regularization term for neural networks, which involve only the model parameters but not the training data.


page 1

page 2

page 3

page 4


Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Natural gradient descent has proven effective at mitigating the effects ...

Gradient Starvation: A Learning Proclivity in Neural Networks

We identify and formalize a fundamental gradient descent phenomenon resu...

The Dynamics of Gradient Descent for Overparametrized Neural Networks

We consider the dynamics of gradient descent (GD) in overparameterized s...

MTAdam: Automatic Balancing of Multiple Training Loss Terms

When training neural models, it is common to combine multiple loss terms...

Partial local entropy and anisotropy in deep weight spaces

We refine a recently-proposed class of local entropic loss functions by ...

Vector-Valued Variation Spaces and Width Bounds for DNNs: Insights on Weight Decay Regularization

Deep neural networks (DNNs) trained to minimize a loss term plus the sum...

Please sign up or login with your details

Forgot password? Click here to reset