Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

11/25/2021
by   Umangi Jain, et al.
0

Despite their massive success, training successful deep neural networks still largely relies on experimentally choosing an architecture, hyper-parameters, initialization, and training mechanism. In this work, we focus on determining the success of standard gradient descent method for training deep neural networks on a specified dataset, architecture, and initialization (DAI) combination. Through extensive systematic experiments, we show that the evolution of singular values of the matrix obtained from the hidden layers of a DNN can aid in determining the success of gradient descent technique to train a DAI, even in the absence of validation labels in the supervised learning paradigm. This phenomenon can facilitate early give-up, stopping the training of neural networks which are predicted to not generalize well, early in the training process. Our experimentation across multiple datasets, architectures, and initializations reveals that the proposed scores can more accurately predict the success of a DAI than simply relying on the validation accuracy at earlier epochs to make a judgment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2022

Mirror descent of Hopfield model

Mirror descent is a gradient descent method that uses a dual space of pa...
research
03/30/2022

Convergence of gradient descent for deep neural networks

Optimization by gradient descent has been one of main drivers of the "de...
research
03/27/2019

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

Modern neural networks are typically trained in an over-parameterized re...
research
02/08/2023

Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy

Despite the recent success of stochastic gradient descent in deep learni...
research
05/24/2019

A Polynomial-Based Approach for Architectural Design and Learning with Deep Neural Networks

In this effort we propose a novel approach for reconstructing multivaria...
research
04/10/2023

Simulated Annealing in Early Layers Leads to Better Generalization

Recently, a number of iterative learning methods have been introduced to...
research
05/26/2022

A framework for overparameterized learning

An explanation for the success of deep neural networks is a central ques...

Please sign up or login with your details

Forgot password? Click here to reset