Exploiting Explainable Metrics for Augmented SGD

03/31/2022
by   Mahdi S. Hosseini, et al.
0

Explaining the generalization characteristics of deep learning is an emerging topic in advanced machine learning. There are several unanswered questions about how learning under stochastic optimization really works and why certain strategies are better than others. In this paper, we address the following question: can we probe intermediate layers of a deep neural network to identify and quantify the learning quality of each layer? With this question in mind, we propose new explainability metrics that measure the redundant information in a network's layers using a low-rank factorization framework and quantify a complexity measure that is highly correlated with the generalization performance of a given optimizer, network, and dataset. We subsequently exploit these metrics to augment the Stochastic Gradient Descent (SGD) optimizer by adaptively adjusting the learning rate in each layer to improve in generalization performance. Our augmented SGD – dubbed RMSGD – introduces minimal computational overhead compared to SOTA methods and outperforms them by exhibiting strong generalization characteristics across application, architecture, and dataset.

READ FULL TEXT

page 2

page 27

page 28

research
04/11/2020

Exploit Where Optimizer Explores via Residuals

To train neural networks faster, many research efforts have been devoted...
research
02/05/2021

Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer

Increasing the batch size of a deep learning model is a challenging task...
research
06/11/2020

AdaS: Adaptive Scheduling of Stochastic Gradients

The choice of step-size used in Stochastic Gradient Descent (SGD) optimi...
research
02/27/2018

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

Stochastic gradient descent (SGD) has achieved great success in training...
research
03/23/2023

The Probabilistic Stability of Stochastic Gradient Descent

A fundamental open problem in deep learning theory is how to define and ...
research
06/19/2018

Faster SGD training by minibatch persistency

It is well known that, for most datasets, the use of large-size minibatc...
research
10/02/2019

Towards Unifying Neural Architecture Space Exploration and Generalization

In this paper, we address a fundamental research question of significant...

Please sign up or login with your details

Forgot password? Click here to reset