Globally Gated Deep Linear Networks

10/31/2022
by   Qianyi Li, et al.
0

Recently proposed Gated Linear Networks present a tractable nonlinear network architecture, and exhibit interesting capabilities such as learning with local error signals and reduced forgetting in sequential learning. In this work, we introduce a novel gating architecture, named Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer, thereby decoupling the architectures of the nonlinear but unlearned gatings and the learned linear processing motifs. We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit, defined by P,N→∞, P/N∼ O(1), where P and N are the training sample size and the network width respectively. We find that the statistics of the network predictor can be expressed in terms of kernels that undergo shape renormalization through a data-dependent matrix compared to the GP kernels. Our theory accurately captures the behavior of finite width GGDLNs trained with gradient descent dynamics. We show that kernel shape renormalization gives rise to rich generalization properties w.r.t. network width, depth and L2 regularization amplitude. Interestingly, networks with sufficient gating units behave similarly to standard ReLU networks. Although gatings in the model do not participate in supervised learning, we show the utility of unsupervised learning of the gating parameters. Additionally, our theory allows the evaluation of the network's ability for learning multiple tasks by incorporating task-relevant information into the gating units. In summary, our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width. The rich and diverse behavior of the GGDLNs suggests that they are helpful analytically tractable models of learning single and multiple tasks, in finite-width nonlinear deep networks.

READ FULL TEXT

page 5

page 8

page 9

research
12/07/2020

Statistical Mechanics of Deep Linear Neural Networks: The Back-Propagating Renormalization Group

The success of deep learning in many real-world tasks has triggered an e...
research
07/21/2022

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Our theoretical understanding of deep learning has not kept pace with it...
research
02/05/2022

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Understanding the asymptotic behavior of gradient-descent training of de...
research
10/05/2022

The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

It is unclear how changing the learning rule of a deep neural network al...
research
06/15/2020

Layer-wise Learning of Kernel Dependence Networks

We propose a greedy strategy to train a deep network for multi-class cla...
research
06/18/2021

The Principles of Deep Learning Theory

This book develops an effective theory approach to understanding deep ne...
research
11/30/2022

Average Path Length: Sparsification of Nonlinearties Creates Surprisingly Shallow Networks

We perform an empirical study of the behaviour of deep networks when pus...

Please sign up or login with your details

Forgot password? Click here to reset