Persistent Neurons

07/02/2020
by   Yimeng Min, et al.
0

Most algorithms used in neural networks(NN)-based leaning tasks are strongly affected by the choices of initialization. Good initialization can avoid sub-optimal solutions and alleviate saturation during training. However, designing improved initialization strategies is a difficult task and our understanding of good initialization is still very primitive. Here, we propose persistent neurons, a strategy that optimizes the learning trajectory using information from previous converged solutions. More precisely, we let the parameters explore new landscapes by penalizing the model from converging to the previous solutions under the same initialization. Specifically, we show that persistent neurons, under certain data distribution, is able to converge to more optimal solutions while initializations under popular framework find bad local minima. We further demonstrate that persistent neurons helps improve the model's performance under both good and poor initializations. Moreover, we evaluate full and partial persistent model and show it can be used to boost the performance on a range of NN structures, such as AlexNet and residual neural network. Saturation of activation functions during persistent training is also studied.

READ FULL TEXT
research
11/02/2020

Reducing Neural Network Parameter Initialization Into an SMT Problem

Training a neural network (NN) depends on multiple factors, including bu...
research
09/20/2021

Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks

Small neural networks with a constrained number of trainable parameters,...
research
01/31/2023

Archetypal Analysis++: Rethinking the Initialization Strategy

Archetypal analysis is a matrix factorization method with convexity cons...
research
09/07/2021

Self-adaptive deep neural network: Numerical approximation to functions and PDEs

Designing an optimal deep neural network for a given task is important a...
research
12/04/2018

Parameter Re-Initialization through Cyclical Batch Size Schedules

Optimal parameter initialization remains a crucial problem for neural ne...
research
03/28/2020

Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

We prove that a single step of gradient decent over depth two network, w...
research
09/29/2012

Self-Delimiting Neural Networks

Self-delimiting (SLIM) programs are a central concept of theoretical com...

Please sign up or login with your details

Forgot password? Click here to reset