MSE-Optimal Neural Network Initialization via Layer Fusion

01/28/2020
by   Ramina Ghods, et al.
0

Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. However, the use of stochastic gradient descent combined with the nonconvexity of the underlying optimization problems renders parameter learning susceptible to initialization. To address this issue, a variety of methods that rely on random parameter initialization or knowledge distillation have been proposed in the past. In this paper, we propose FuseInit, a novel method to initialize shallower networks by fusing neighboring layers of deeper networks that are trained with random initialization. We develop theoretical results and efficient algorithms for mean-square error (MSE)-optimal fusion of neighboring dense-dense, convolutional-dense, and convolutional-convolutional layers. We show experiments for a range of classification and regression datasets, which suggest that deeper neural networks are less sensitive to initialization and shallower networks can perform better (sometimes as well as their deeper counterparts) if initialized with FuseInit.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2022

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

In this work, we provide a characterization of the feature-learning proc...
research
11/29/2022

Mirror descent of Hopfield model

Mirror descent is a gradient descent method that uses a dual space of pa...
research
12/19/2014

Qualitatively characterizing neural network optimization problems

Training neural networks involves solving large-scale non-convex optimiz...
research
05/22/2017

Cost-Performance Tradeoffs in Fusing Unreliable Computational Units

We investigate fusing several unreliable computational units that perfor...
research
03/25/2016

The Asymptotic Performance of Linear Echo State Neural Networks

In this article, a study of the mean-square error (MSE) performance of l...
research
06/05/2019

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Residual networks (ResNet) and weight normalization play an important ro...
research
09/30/2022

Sparse tree-based initialization for neural networks

Dedicated neural network (NN) architectures have been designed to handle...

Please sign up or login with your details

Forgot password? Click here to reset