Rethink the Connections among Generalization, Memorization and the Spectral Bias of DNNs
Over-parameterized deep neural networks (DNNs) with sufficient capacity to memorize random noise can achieve excellent generalization performance on normal datasets, challenging the bias-variance trade-off in classical learning theory. Recent studies claimed that DNNs first learn simple patterns and then memorize noise; some other works showed that DNNs have a spectral bias to learn target functions from low to high frequencies during training. These suggest some connections among generalization, memorization and the spectral bias of DNNs: the low-frequency components in the input space represent the patterns which can generalize, whereas the high-frequency components represent the noise which needs to be memorized. However, we show that it is not true: under the experimental setup of deep double descent, the high-frequency components of DNNs begin to diminish in the second descent, whereas the examples with random labels are still being memorized. Moreover, we find that the spectrum of DNNs can be applied to monitoring the test behavior, e.g., it can indicate when the second descent of the test error starts, even though the spectrum is calculated from the training set only.
READ FULL TEXT