Pretraining a Neural Network before Knowing Its Architecture

07/20/2022
by   Boris Knyazev, et al.
0

Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones. A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50. While networks with predicted parameters lose performance on the source task, the predicted parameters have been found useful for fine-tuning on other tasks. We study if fine-tuning based on the same GHN is still useful on novel strong architectures that were published after the GHN had been trained. We found that for recent architectures such as ConvNeXt, GHN initialization becomes less useful than for ResNet-50. One potential reason is the increased distribution shift of novel architectures from those used to train the GHN. We also found that the predicted parameters lack the diversity necessary to successfully fine-tune parameters with gradient descent. We alleviate this limitation by applying simple post-processing techniques to predicted parameters before fine-tuning them on a target task and improve fine-tuning of ResNet-50 and ConvNeXt.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2017

Gradual Tuning: a better way of Fine Tuning the parameters of a Deep Neural Network

In this paper we present an alternative strategy for fine-tuning the par...
research
10/19/2022

lo-fi: distributed fine-tuning without communication

When fine-tuning large neural networks, it is common to use multiple nod...
research
09/16/2022

Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods

The impressive performances of deep learning architectures is associated...
research
03/07/2023

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Pretraining a neural network on a large dataset is becoming a cornerston...
research
02/12/2023

Sparse Mutation Decompositions: Fine Tuning Deep Neural Networks with Subspace Evolution

Neuroevolution is a promising area of research that combines evolutionar...
research
10/25/2021

Parameter Prediction for Unseen Deep Architectures

Deep learning has been successful in automating the design of features i...
research
03/07/2022

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Hyperparameter (HP) tuning in deep learning is an expensive process, pro...

Please sign up or login with your details

Forgot password? Click here to reset