A number of competing hypotheses have been proposed to explain why
small...
The mechanisms by which certain training interventions, such as increasi...
We empirically demonstrate that full-batch gradient descent on neural ne...
For a standard convolutional neural network, optimizing over the input p...