RandomOut: Using a convolutional gradient norm to rescue convolutional filters

02/18/2016
by   Joseph Paul Cohen, et al.
0

Filters in convolutional neural networks are sensitive to their initialization. The random numbers used to initialize filters are a bias and determine if you will "win" and converge to a satisfactory local minimum so we call this The Filter Lottery. We observe that the 28x28 Inception-V3 model without Batch Normalization fails to train 26 random seed alone. This is a problem that affects the trial and error process of designing a network. Because random seeds have a large impact it makes it hard to evaluate a network design without trying many different random starting weights. This work aims to reduce the bias imposed by the initial weights so a network converges more consistently. We propose to evaluate and replace specific convolutional filters that have little impact on the prediction. We use the gradient norm to evaluate the impact of a filter on error, and re-initialize filters when the gradient norm of its weights falls below a specific threshold. This consistently improves accuracy on the 28x28 Inception-V3 with a median increase of +3.3 increases the number of filters explored without increasing the size of the network. We observe that the RandomOut method has more consistent generalization performance, having a standard deviation of 1.3 when varying random seeds, and does so faster and with fewer parameters.

READ FULL TEXT
research
11/01/2018

Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration

Previous works utilized "smaller-norm-less-important" criterion to prune...
research
01/15/2020

Filter Grafting for Deep Neural Networks

This paper proposes a new learning paradigm called filter grafting, whic...
research
10/07/2022

Understanding the Covariance Structure of Convolutional Filters

Neural network weights are typically initialized at random from univaria...
research
03/18/2023

ExplainFix: Explainable Spatially Fixed Deep Networks

Is there an initialization for deep networks that requires no learning? ...
research
03/22/2021

Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

Normalization operations are essential for state-of-the-art neural netwo...
research
01/23/2020

DCT-Conv: Coding filters in convolutional networks with Discrete Cosine Transform

Convolutional neural networks are based on a huge number of trained weig...
research
02/21/2019

Convolutional Analysis Operator Learning: Dependence on Training Data

Convolutional analysis operator learning (CAOL) enables the unsupervised...

Please sign up or login with your details

Forgot password? Click here to reset