Are wider nets better given the same number of parameters?

10/27/2020
by   Anna Golubeva, et al.
0

Empirical studies demonstrate that the performance of neural networks improves with increasing number of parameters. In most of these studies, the number of parameters is increased by increasing the network width. This begs the question: Is the observed improvement due to the larger number of parameters, or is it due to the larger width itself? We compare different ways of increasing model width while keeping the number of parameters constant. We show that for models initialized with a random, static sparsity pattern in the weight tensors, network width is the determining factor for good performance, while the number of weights is secondary, as long as trainability is ensured. As a step towards understanding this effect, we analyze these models in the framework of Gaussian Process kernels. We find that the distance between the sparse finite-width model kernel and the infinite-width kernel at initialization is indicative of model performance.

READ FULL TEXT

page 6

page 16

research
02/08/2022

Width is Less Important than Depth in ReLU Neural Networks

We solve an open question from Lu et al. (2017), by showing that any tar...
research
11/16/2022

An Empirical Analysis of the Advantages of Finite- v.s. Infinite-Width Bayesian Neural Networks

Comparing Bayesian neural networks (BNNs) with different widths is chall...
research
05/14/2020

Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?

One of the generally accepted views of modern deep learning is that incr...
research
03/09/2021

Enhancing sensor resolution improves CNN accuracy given the same number of parameters or FLOPS

High image resolution is critical to obtain a good performance in many c...
research
02/21/2008

Testing the number of parameters with multidimensional MLP

This work concerns testing the number of parameters in one hidden layer ...
research
07/13/2021

How many degrees of freedom do we need to train deep networks: a loss landscape perspective

A variety of recent works, spanning pruning, lottery tickets, and traini...
research
11/30/2022

Average Path Length: Sparsification of Nonlinearties Creates Surprisingly Shallow Networks

We perform an empirical study of the behaviour of deep networks when pus...

Please sign up or login with your details

Forgot password? Click here to reset