The training process of ReLU neural networks often exhibits complicated
...
The observation that stochastic gradient descent (SGD) favors flat minim...
This paper presents a novel approach that supports natural language voic...
Generalization error bounds for deep neural networks trained by stochast...
The convergence of GD and SGD when training mildly parameterized neural
...