Parameter Prediction for Unseen Deep Architectures

10/25/2021
by   Boris Knyazev, et al.
0

Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60 accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50 to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.

READ FULL TEXT
research
03/07/2023

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Pretraining a neural network on a large dataset is becoming a cornerston...
research
10/05/2022

Meta-Ensemble Parameter Learning

Ensemble of machine learning models yields improved performance as well ...
research
07/20/2022

Pretraining a Neural Network before Knowing Its Architecture

Training large neural networks is possible by training a smaller hyperne...
research
05/03/2021

OpTorch: Optimized deep learning architectures for resource limited environments

Deep learning algorithms have made many breakthroughs and have various a...
research
06/01/2018

TAPAS: Train-less Accuracy Predictor for Architecture Search

In recent years an increasing number of researchers and practitioners ha...
research
11/16/2018

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe is a scalable pipeline parallelism library that enables learning o...
research
02/08/2018

Practical Issues of Action-conditioned Next Image Prediction

The problem of action-conditioned image prediction is to predict the exp...

Please sign up or login with your details

Forgot password? Click here to reset