Mini-batch SGD with momentum is a fundamental algorithm for learning lar...
A memory efficient approach to ensembling neural networks is to share mo...
Performance of optimization on quadratic problems sensitively depends on...
Current theoretical results on optimization trajectories of neural netwo...