Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights

09/29/2022
by   Konstantin Schürholt, et al.
12

Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we extend hyper-representations for generative use to sample new model weights. We propose layer-wise loss normalization which we demonstrate is key to generate high-performing models and several sampling methods based on the topology of hyper-representations. The models generated using our methods are diverse, performant and capable to outperform strong baselines as evaluated on several downstream tasks: initialization, ensemble sampling and transfer learning. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations thereby paving the avenue for novel research directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2022

Hyper-Representations for Pre-Training and Transfer Learning

Learning representations of neural network weights given a model zoo is ...
research
10/07/2021

Conceptual Expansion Neural Architecture Search (CENAS)

Architecture search optimizes the structure of a neural network for some...
research
10/28/2021

Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction

Self-Supervised Learning (SSL) has been shown to learn useful and inform...
research
05/24/2022

Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

Massively multilingual models are promising for transfer learning across...
research
09/04/2019

Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks

Knowledge distillation (KD) is a very popular method for model size redu...
research
02/18/2023

RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

Self-supervised speech pre-training enables deep neural network models t...
research
02/05/2023

Using Intermediate Forward Iterates for Intermediate Generator Optimization

Score-based models have recently been introduced as a richer framework t...

Please sign up or login with your details

Forgot password? Click here to reset