
MultigridinChannels Architectures for Wide Convolutional Neural Networks
We present a multigrid approach that combats the quadratic growth of the...
read it

Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?
One of the generally accepted views of modern deep learning is that incr...
read it

Enhancing sensor resolution improves CNN accuracy given the same number of parameters or FLOPS
High image resolution is critical to obtain a good performance in many c...
read it

MultigridinChannels Neural Network Architectures
We present a multigridinchannels (MGIC) approach that tackles the quad...
read it

Testing the number of parameters with multidimensional MLP
This work concerns testing the number of parameters in one hidden layer ...
read it

How many degrees of freedom do we need to train deep networks: a loss landscape perspective
A variety of recent works, spanning pruning, lottery tickets, and traini...
read it

Finite Versus Infinite Neural Networks: an Empirical Study
We perform a careful, thorough, and large scale empirical study of the c...
read it
Are wider nets better given the same number of parameters?
Empirical studies demonstrate that the performance of neural networks improves with increasing number of parameters. In most of these studies, the number of parameters is increased by increasing the network width. This begs the question: Is the observed improvement due to the larger number of parameters, or is it due to the larger width itself? We compare different ways of increasing model width while keeping the number of parameters constant. We show that for models initialized with a random, static sparsity pattern in the weight tensors, network width is the determining factor for good performance, while the number of weights is secondary, as long as trainability is ensured. As a step towards understanding this effect, we analyze these models in the framework of Gaussian Process kernels. We find that the distance between the sparse finitewidth model kernel and the infinitewidth kernel at initialization is indicative of model performance.
READ FULL TEXT
Comments
There are no comments yet.