Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks

10/16/2018
by   Zhibin Liao, et al.
0

In this paper, we introduce a novel methodology for characterising the performance of deep learning networks (ResNets and DenseNet) with respect to training convergence and generalisation as a function of mini-batch size and learning rate for image classification. This methodology is based on novel measurements derived from the eigenvalues of the approximate Fisher information matrix, which can be efficiently computed even for high capacity deep models. Our proposed measurements can help practitioners to monitor and control the training process (by actively tuning the mini-batch size and learning rate) to allow for good training convergence and generalisation. Furthermore, the proposed measurements also allow us to show that it is possible to optimise the training process with a new dynamic sampling training approach that continuously and automatically change the mini-batch size and learning rate during the training process. Finally, we show that the proposed dynamic sampling training approach has a faster training time and a competitive classification accuracy compared to the current state of the art.

READ FULL TEXT
research
02/13/2020

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

Large-scale distributed training of deep neural networks results in mode...
research
10/19/2018

Sequenced-Replacement Sampling for Deep Learning

We propose sequenced-replacement sampling (SRS) for training deep neural...
research
03/20/2018

MLtuner: System Support for Automatic Machine Learning Tuning

MLtuner automatically tunes settings for training tunables (such as the ...
research
04/26/2019

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

With an increasing demand for training powers for deep learning algorith...
research
12/02/2019

A Multigrid Method for Efficiently Training Video Models

Training competitive deep video models is an order of magnitude slower t...
research
05/24/2019

X-TrainCaps: Accelerated Training of Capsule Nets through Lightweight Software Optimizations

Convolutional Neural Networks (CNNs) are extensively in use due to their...
research
09/13/2021

Explaining Deep Learning Representations by Tracing the Training Process

We propose a novel explanation method that explains the decisions of a d...

Please sign up or login with your details

Forgot password? Click here to reset