DeepAI AI Chat
Log In Sign Up

MaSS: an Accelerated Stochastic Method for Over-parametrized Learning

by   Chaoyue Liu, et al.
The Ohio State University

In this paper we introduce MaSS (Momentum-added Stochastic Solver), an accelerated SGD method for optimizing over-parameterized networks. Our method is simple and efficient to implement and does not require changing parameters or computing full gradients in the course of optimization. We provide a detailed theoretical analysis for convergence and parameter selection including their dependence on the mini-batch size in the quadratic case. We also provide theoretical convergence results for a more general convex setting. We provide an experimental evaluation showing strong performance of our method in comparison to Adam and SGD for several standard architectures of deep networks including ResNet, convolutional and fully connected networks. We also show its performance for convex kernel machines.


page 1

page 2

page 3

page 4


Is Local SGD Better than Minibatch SGD?

We study local SGD (also known as parallel SGD and federated averaging),...

Escaping Saddle Points Faster with Stochastic Momentum

Stochastic gradient descent (SGD) with stochastic momentum is popular in...

EE-Grad: Exploration and Exploitation for Cost-Efficient Mini-Batch SGD

We present a generic framework for trading off fidelity and cost in comp...

Accelerated Sparsified SGD with Error Feedback

We study a stochastic gradient method for synchronous distributed optimi...

Accelerated Zeroth-Order Momentum Methods from Mini to Minimax Optimization

In the paper, we propose a new accelerated zeroth-order momentum (Acc-ZO...

Towards Sustainable Learning: Coresets for Data-efficient Deep Learning

To improve the efficiency and sustainability of learning deep models, we...

Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds

There has been a strong demand for algorithms that can execute machine l...