1 Introduction
Humans are usually the ones designing and properly training neural networks. Since neural networks are capable of overcoming humans on various learning tasks, there appears a natural question: can they also be better at creating new neural networks? This question was already asked few times, but we do not have a clear answer yet.
There are two common approaches to the above question. The first direction focuses on generating the whole network architecture. It was attacked by employing reinforcement learning
[Zoph and Le2016][Maziarz et al.2018]. Although this allows to create a novel architectures from scratch, it is very time and resource consuming. In the second approach, we assume a predefined architecture and focus on generating correct weights. For this purpose, one can design a hypernetwork which acts as a generator, which returns weights to other networks (target networks) [Ha et al.2016]. Hypernetworks were successfully applied to reducing trainable parameters [Ha et al.2016], generative models for neural networks, maximum likelihood estimation
[Sheikh et al.2017], in Bayesian context [Krueger et al.2017], [Louizos and Welling2017], etc. [Brock et al.2017], [Zhang et al.2018a], [Lorraine and Duvenaud2018]. Basic problem is that a single hypernetowrk generates weights, which could only solve one specific problem. In this paper, we focus on designing hypernetwork, which could return multiple target networks that are customized to serve different purposes.We present two applications of this model. In the first one, we use hypernetworks to create functional continuous representation of iamges. More precisely, the hypernetwork takes an image and produces weights to target network , which approximates RGB intensities of each coordinate pair. Due to the continuity of representation, we can look at the image at different scales, which is experimentally verified in the case of superresolution, Figure 1. In contrast to typical superresolution approaches, we have a single model responsible for upscalling the image to any size. Moreover, we can create nonstandard sizes at test time, Figure 2. Since we have one neural network responsible for creating individual image models, similar images are described by similar target networks. In consequence, it is possible to interpolate between weights of target networks and produces natural images, Figure 3. We also demonstrate that hypernetworks can create generative models for new data sets at test time. To show its usefulness, we design a hypernetwork for describing distributions of 2D points clouds.
2 Hypernetwork model
We start with recalling a basic hypernetwork model. Next, we present how to adapt it to create multiple target networks for different tasks on two practical examples.
Let be a target neural network, where is a set of trainable parameters and
are input and output domains, respectively. Our objective is to find weight vector
which solves a given learning problem. As an alternative to a typical backpropagation procedure, the hypernetworks mechanism can be used. In this framework, an additional neural network (hypernetwork)
is employed, where is some input domain and is a weights vector. Given an instance , the hypernetwork returns weights to corresponding target network . Thus the solution is given by a network . In the following subsections, we show how to train the hypernetwork, which returns weights to multiple target networks used for different purposes.
2.1 Functional image representation
Image is usually represented as a twodimensional pixel matrix. This representation is discrete and, in consequence, it is difficult to look at the image at different scales. Moreover, it is impossible to perform typical mathematical operations such as differentiation, applying continuous filters, etc.. As a remedy, one could create a functional continuous representation of images. More precisely, we aim at creating a function
which approximates RGB intensities of each coordinate pair .
In the simplest case, this function can be obtained by linear or quadratic interpolation. However, restricting to a given class of functions may br insufficient. On the other hand, one could represent an image with a use of neural network, but there is no point in training separate model for each image. We approach this task by introducing a hypernetwork
which for an image returns weights to the corresponding target network. In consequence, an image is represented as a function , which for any coordinates returns corresponding RGB intensities of image .
The above model can be trained by minimizing classical MSE loss. More precisely, we take an input image , generate weights to target network and compare obtained representation with input image pixel by pixel. We minimize the expected mean square error over traing set of images:
Observe that we train only a single neural model (hypernetwork), which allows to produce a great variety of functions at test time. We expect that target networks for similar images will be similar. In consequence, interpolation between weights of target networks should lead to reasonably looking images. In contrast, if we created individual network for every image, such an interpolation would be misleading (see experimental section for details).
2.2 Generative model
Generative models allow to learn the underlying data distribution and create new objects, e.g. images, texts, etc.. However, an individual model has to be trained from scratch for every new data set.
We present how to train hypernetwork to produce generative model for a new data set at test time. The hypernetwork operates on the family of data sets and for each data set returns weights to the corresponding generative model . As a proof of concept, we describe how to realize this idea in a low dimensional spaces.
Let an empirical probability distribution of a given data set
be described by . We would like to model by a transformation of some simple probability distribution defined on the space . Thus we look for weights to neural network . Generative properties are verified by comparing with , where is a data set generated from. In low dimensional spaces, one can use typical kernel density estimation to approximate transformed density. More precisely, we generate a sample
from , transform it by and obtain , for . Next, we create kernel density estimation bywhere is model parameter. Finally, we compare with by minimizing the crossentropy:
(1) 
where the sum is taken over elements of data set .
The advantage of our approach is that, at test time the hypernetwork returns a complete generative model for a given data set. Clearly, this approach can be successful only in lowdimensions, because we use kernel density estimation and compare probability distributions in the original space. To adapt this model to higher dimensions, we should use autoencoder structure and compare distributions using e.g. MMD distance
[Li et al.2017], see [Tolstikhin et al.2017]. In the experiments, we demonstrate that the model in this form has a potential to be used for object modeling spaces.3 Experiments
We present potential applications of the proposed methodology. First, we apply the model to functional image representation and examine its performance in superresolution. Next, we use it to describe 2D shapes. In both cases, we show some interesting geometrical properties of target networks space. We start with presenting the architecture used in the experiments.
3.1 Architecture
Designing correct architectures of target network and hypernetwork for a given task is very important. In particular, optimal architectures for superresolution are different from those used in generative models. Since this paper is devoted to give a proof of concept, we use a single architecture in all experiments.
Scale  Bicubic  SRCNN [Dong et al.2016a]  RDN [Zhang et al.2018b]  Ours  
Set5  2x  33.64  36.66  38.30  36.09 
3x  30.41  32.75  34.78  32.85  
4x  28.42  30.49  32.61  30.69  
Set14  2x  30.33  32.45  34.10  32.30 
3x  27.63  29.30  30.67  29.37  
4x  26.08  27.50  28.92  27.61  
B100  2x  29.48  31.36  32.40  31.11 
3x  27.12  28.41  29.33  28.31  
4x  25.87  26.90  27.80  26.86  
Urban100  2x  26.85  29.50  33.09  29.43 
3x  24.43  26.24  29.00  26.26  
4x  23.11  24.52  26.82  24.56 
Scale  Bicubic  SRCNN [Dong et al.2016a]  RDN [Zhang et al.2018b]  Ours  
Set5  2x  0.930  0.9542  0.9616  0.9505 
3x  0.869  0.9090  0.9300  0.9095  
4x  0.812  0.8628  0.9003  0.8691  
Set14  2x  0.869  0.9067  0.9218  0.899 
3x  0.775  0.8215  0.8482  0.8164  
4x  0.703  0.7513  0.7893  0.7506  
B100  2x  0.843  0.8879  0.9022  0.879 
3x  0.738  0.7863  0.8105  0.778  
4x  0.666  0.7101  0.7434  0.706  
Urban100  2x  0.839  0.8946  0.9368  0.891 
3x  0.733  0.7989  0.8683  0.798  
4x  0.656  0.7221  0.8069  0.723 
Target network.
An architecture of the target network is supposed to be simple and small. This allows to keep the performance of training phase at the highest possible level as the target network is not directly trained. Moreover, small networks can be easily reused for other applications.
The target network consists of five fullyconnected layers, see Figure 4
. The layers’ dimensions are being gradually increased. This is happening up to the middle layer. Later on, they are being decreased. This is because steep transitions of layers’ dimensions negatively affect the learning ability of neural network. Additionally, batch normalization is done between each layer
[Ioffe and Szegedy2015]. We have chosen ato be an activation function between two consecutive layers
[Goodfellow et al.2016], which worked much better than ReLU for our purpose. An activation function for the last layer is sigmoid. Since the size of hypernetwork’s output depends on the number of trainable parameters in the target network, we used residual connections in the target network. No convolutions were used, because the input of the target network is too simple.
Hypernetwork.
Hypernetwork is a convolutional neural network with some modifications, see Figure
5. We created an eight layered network with one residual connection. To reduce the number of trainable parameters, we adapt an approach used in the inception network [Szegedy et al.2015]. More precisely, instead of convolution, we used convolution followed by a convolution (the notation is going to be used for simplicity of description). To generate the weights for each target network’s layer, we designed the following process. The first few layers of the hypernetwork are common and take part in generating weights for every target network’s layer. Next, they are split into different branches. There is a branch for each layer of the target network. The purpose of initial layers is to extract features from an input image. The following layers are supposed to find weights for the target network based on these features. This process led to faster training than creating a separate hypernetwork for each layer of the target network. ReLU was selected to be an activation function for every layer in the hypernetwork. Additionally, there is a batch normalization used after each layer.3.2 Superresolution
Since target network gives a functional continuous representation of input image, we can upscale the image to any size and, in consequence, apply our approach to superresolution.
To make this approach successful we feed the hypernetowrk with low resolution images and evaluate MSE loss on high resolution ones. More precisely, we take the original image of the size , downscale it to using bicubic interpolation and input it to the hypernetwork. The hypernetwork produces the weights to target network, which defines the functional representation of input image. To evaluate its quality, we take a grid of the size on the image returned by target network and compare the values with pixels intensities of original image using MSE loss.
Since input images can have different resolutions, we split them into overlapping parts of fixed sizes. In consequence, the value at each coordinate is described by multiple target networks. To produce a single output for every coordinate at test phase, we take the (weighted) average of values returned by all target networks covering this coordinate. This also allows to smooth the output function.
To test our approach, we trained the model on examples from DIV2K data set [Agustsson and Timofte2017]. Its performance was evaluated on Set5 [Bevilacqua et al.2012], Set14 [Zeyde et al.2010], B100 [Martin et al.2001], and Urban100 [Huang et al.2015]. As a quality measure we used two typical measures applied in superresolution tasks, PSNR and SSIM [Wang et al.2004]. Their high values indicate better performance of a model. We considered scale factors .
As a baseline, we used bicubic interpolation. Moreover, we compared our approach with SRCNN [Dong et al.2016b], which was a stateoftheart method in 2016. We also included a recent stateoftheart – RDN [Zhang et al.2018c]. Our goal is to train a single hypernetwork model to generate images at various scales. This is more general solution than typical superresolution approaches, where every model is responsible for upscaling the image to only one resolution. In consequence, it is expected that both SRCNN and RDN will perform better than our method. We trained our network once on images downscaled 2, 3 and 4 times. Moreover, we supplied a target network with an additional parameter , which indicated the scaling factor.
The results presented in Tables 1 and 2 demonstrate that our model gave significantly better performance than baseline bicubic interpolation (see also Figure 1 for sample result). Surprisingly, a single hypernetwork trained on all scales achieved a comparable performance to SRCNN, which created a separate model for each scale factor. It shows high potential of our models as a method of image representation. Nevertheless, it was not able to obtain as high scores as recent stateoftheart in superresolution. It might be caused by insufficient architecture of hypernetwork. In our opinion, designing similar architecture of hypernetwork to RDN should lead to comparable performance. The main advantage of our approach is its generality – we trained a single model for various scale factors.
3.3 Target networks geometry
It is believed that high dimensional data, e.g. images, are embedded in lowdimensional manifolds
[Goodfellow et al.2016]. In consequence, direct linear interpolation between images will not produce nicely looking pictures.In this experiment, we would like to inspect the space of target network weights. In particular, we verify whether linear interpolation between weights of two target networks produces true images. For this purpose, we train the hypernetwork model presented in previous subsection on images in a single scale. In other words, we are not interested in rescaling images, but only in creating their functional representation. We use CelebA data set [Liu et al.2015] with a trivial preprocessing of cropping central 128x128 pixels from the image and resizing it to the size of 64x64 pixels.
In test phase, we generate target networks for two images and take the linear interpolation between weights of these networks. Figure 3
presents images returned by interpolated target networks. It is evident that interpolation produces images from true data distribution. It means that we transformed a manifold of images into more compact structure (set of weights) where linear transformation can be applied. It allows to suspect that similar images have similar weights to their target networks. For a comparison, we generated classical pixelwise interpolation between analogical examples. As can be seen in Figure
6, the results are much worse, because interpolation gives superimposition of images.Going further, we verified a layerwise interpolation. Namely, we took weights to one target network and gradually changed weights of the first layers in a direction to corresponding weights in the second target network. As can be seen in Figure 7 each layer may be understood as having different functionality, i.e. third layer is responsible for the general shape, while the last layer corrects the colors in the image. In future work, it may be interesting to use a hypernetworks mechanism to obtain disentangled representation.
3.4 Generating models of objects
Generative models give an opportunity to create new examples from the underlying data set. We demonstrate that hypernetworks can be trained so that to return an individual generative model for a new data set at test time.
As a proof of concept, we tested our method on 2dimensional data sets. We used MNIST database
[LeCun et al.1998] and interpret each image as an individual probability distribution on 2D space. Namely, the probability of generating an example for a given image, where , is proportional to the brightness of the pixel . We would like to create an individual generative model for every image (understood as 2D data set). For this purpose, we need to find a transformationof some simple probability distribution, which allows to produce a given image. In this example we look for a transformation of a uniform distribution on a unit circle
in 2D space. Intuitively, we would like to transform a circle into a shape of a given image. In every batch, we take a sample from a circle, transform it by and use kernel density estimation to model resulting distribution . Similarity between and empirical distribution is verified by computing crossentropy as described in Section 2.2.Sample results are presented in Figure 8. Blue points show the values produced by a target network, while the line was traced between subsequent points mapped from a circle. As can be seen, making use of target network we obtain an outline of a given digit if we iterate over the circle in small steps. Despite the simplicity of this experiment, its extension may find the applications in objects modeling. For example, we may use this approach in 3D printing, where finding intermediary steps between sampled points is a crucial task [Zarzar Gandler2017]. Since it is essential that the running time of such algorithms should be low [Morse et al.2005], the use of methods that work in constant time (such as a neural network) may be beneficial. In future, we plan to use reversible generative networks [Kingma and Dhariwal2018] to remove redundant loops visible in the image of circle.
Analogically to previous experiment, we also examined the interpolation between two generative models (weights of two target networks). Sample results shown in Figure 9 demonstrate that obtained shapes change gradually. It means that target networks with similar weights produce similar images. This effect could not be achieved if we use separate hypernetwork for every image.
4 Conclusion
We presented the extension of hypernetworks mechanism, which allows to create target networks for various purposes using a single hypernetwork model. This approach was applied for creating functional image representation and constructing generative models for new data sets at test time. Due to the continuity of representation, we were able to upscale the image to any resolution. We also observed that constructed hypernetwork transformed a manifold of images to more compact space, where linear interpolation between images could be performed. Namely, we can traverse linearly from one image to another and not fall out from the true data distribution. Our experiments suggest that hypernetworks can be used to produce generative models. In future, we plan to use our approach to create generative models for higher dimensional data.
References

[Agustsson and
Timofte2017]
Eirikur Agustsson and Radu Timofte.
Ntire 2017 challenge on single image superresolution: Dataset and
study.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
, pages 126–135, 2017.  [Bevilacqua et al.2012] Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie Line AlberiMorel. Lowcomplexity singleimage superresolution based on nonnegative neighbor embedding. 2012.
 [Brock et al.2017] Andrew Brock, Theodore Lim, James M. Ritchie, and Nick Weston. SMASH: oneshot model architecture search through hypernetworks. CoRR, abs/1708.05344, 2017.
 [Dong et al.2016a] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image superresolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2016.
 [Dong et al.2016b] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image superresolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2016.
 [Goodfellow et al.2016] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
 [Ha et al.2016] David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. arXiv preprint arXiv:1609.09106, 2016.
 [Huang et al.2015] JiaBin Huang, Abhishek Singh, and Narendra Ahuja. Single image superresolution from transformed selfexemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5197–5206, 2015.
 [Ioffe and Szegedy2015] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
 [Kingma and Dhariwal2018] Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pages 10236–10245, 2018.
 [Krueger et al.2017] David Krueger, ChinWei Huang, Riashat Islam, Ryan Turner, Alexandre Lacoste, and Aaron Courville. Bayesian hypernetworks. arXiv preprint arXiv:1710.04759, 2017.
 [LeCun et al.1998] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[Li et al.2017]
ChunLiang Li, WeiCheng Chang, Yu Cheng, Yiming Yang, and Barnabás
Póczos.
Mmd gan: Towards deeper understanding of moment matching network.
In Advances in Neural Information Processing Systems, pages 2203–2213, 2017.  [Liu et al.2015] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
 [Lorraine and Duvenaud2018] Jonathan Lorraine and David Duvenaud. Stochastic hyperparameter optimization through hypernetworks. CoRR, abs/1802.09419, 2018.

[Louizos and
Welling2017]
Christos Louizos and Max Welling.
Multiplicative normalizing flows for variational bayesian neural
networks.
In
Proceedings of the 34th International Conference on Machine LearningVolume 70
, pages 2218–2227. JMLR. org, 2017.  [Martin et al.2001] David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In null, page 416. IEEE, 2001.
 [Maziarz et al.2018] Krzysztof Maziarz, Andrey Khorlin, Quentin de Laroussilhe, and Andrea Gesmundo. Evolutionaryneural hybrid agents for architecture search. arXiv preprint arXiv:1811.09828, 2018.

[Morse et al.2005]
Bryan S Morse, Terry S Yoo, Penny Rheingans, David T Chen, and Kalpathi R
Subramanian.
Interpolating implicit surfaces from scattered surface data using compactly supported radial basis functions.
In ACM SIGGRAPH 2005 Courses, page 78. ACM, 2005.  [Sheikh et al.2017] AbdulSaboor Sheikh, Kashif Rasul, Andreas Merentitis, and Urs Bergmann. Stochastic maximum likelihood optimization via hypernetworks. arXiv preprint arXiv:1712.01141, 2017.
 [Szegedy et al.2015] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
 [Tolstikhin et al.2017] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein autoencoders. arXiv preprint arXiv:1711.01558, 2017.
 [Wang et al.2004] Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
 [Zarzar Gandler2017] Gabriela Zarzar Gandler. Evaluation of probabilistic representations for modeling and understanding shape based on synthetic and real sensory data, 2017.
 [Zeyde et al.2010] Roman Zeyde, Michael Elad, and Matan Protter. On single image scaleup using sparserepresentations. In International conference on curves and surfaces, pages 711–730. Springer, 2010.
 [Zhang et al.2018a] Chris Zhang, Mengye Ren, and Raquel Urtasun. Graph hypernetworks for neural architecture search. CoRR, abs/1810.05749, 2018.
 [Zhang et al.2018b] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image superresolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
 [Zhang et al.2018c] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image superresolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2472–2481, 2018.
 [Zoph and Le2016] Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
Comments
There are no comments yet.