Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations

06/20/2023
by   Gage DeZoort, et al.
0

This article derives and validates three principles for initialization and architecture selection in finite width graph neural networks (GNNs) with ReLU activations. First, we theoretically derive what is essentially the unique generalization to ReLU GNNs of the well-known He-initialization. Our initialization scheme guarantees that the average scale of network outputs and gradients remains order one at initialization. Second, we prove in finite width vanilla ReLU GNNs that oversmoothing is unavoidable at large depth when using fixed aggregation operator, regardless of initialization. We then prove that using residual aggregation operators, obtained by interpolating a fixed aggregation operator with the identity, provably alleviates oversmoothing at initialization. Finally, we show that the common practice of using residual connections with a fixup-type initialization provably avoids correlation collapse in final layer features at initialization. Through ablation studies we find that using the correct initialization, residual aggregation operators, and residual connections in the forward pass significantly and reliably speeds up early training dynamics in deep ReLU GNNs on a variety of tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2020

Residual Tangent Kernels

A recent body of work has focused on the theoretical study of neural net...
research
05/25/2020

Fractional moment-preserving initialization schemes for training fully-connected neural networks

A common approach to initialization in deep neural networks is to sample...
research
03/27/2019

A Sober Look at Neural Network Initializations

Initializing the weights and the biases is a key part of the training pr...
research
06/27/2022

AutoInit: Automatic Initialization via Jacobian Tuning

Good initialization is essential for training Deep Neural Networks (DNNs...
research
06/10/2019

Scaling Laws for the Principled Design, Initialization and Preconditioning of ReLU Networks

In this work, we describe a set of rules for the design and initializati...
research
03/15/2019

Dying ReLU and Initialization: Theory and Numerical Examples

The dying ReLU refers to the problem when ReLU neurons become inactive a...
research
12/24/2017

Mean Field Residual Networks: On the Edge of Chaos

We study randomly initialized residual networks using mean field theory ...

Please sign up or login with your details

Forgot password? Click here to reset