Entangled Residual Mappings

06/02/2022
by   Mathias Lechner, et al.
6

Residual mappings have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers. This interplay, combined with their stabilizing effect on the gradient norms, enables them to train very deep networks. In this paper, we take a step further and introduce entangled residual mappings to generalize the structure of the residual connections and evaluate their role in iterative learning representations. An entangled residual mapping replaces the identity skip connections with specialized entangled mappings such as orthogonal, sparse, and structural correlation matrices that share key attributes (eigenvalues, structure, and Jacobian norm) with identity mappings. We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks differently than attention-based models and recurrent neural networks. In general, we find that for CNNs and Vision Transformers entangled sparse mapping can help generalization while orthogonal mappings hurt performance. For recurrent networks, orthogonal residual mappings form an inductive bias for time-variant sequences, which degrades accuracy on time-invariant tasks.

READ FULL TEXT

page 7

page 12

page 13

research
10/13/2017

Residual Connections Encourage Iterative Inference

Residual networks (Resnets) have become a prominent architecture in deep...
research
05/14/2019

Deep Residual Output Layers for Neural Language Generation

Many tasks, including language generation, benefit from learning the str...
research
11/04/2016

Learning Identity Mappings with Residual Gates

We propose a new layer design by adding a linear gating mechanism to sho...
research
05/27/2019

Identity Connections in Residual Nets Improve Noise Stability

Residual Neural Networks (ResNets) achieve state-of-the-art performance ...
research
12/08/2017

Chaining Identity Mapping Modules for Image Denoising

We propose to learn a fully-convolutional network model that consists of...
research
07/19/2017

Orthogonal and Idempotent Transformations for Learning Deep Neural Networks

Identity transformations, used as skip-connections in residual networks,...
research
10/05/2020

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

Traditional (unstructured) pruning methods for a Transformer model focus...

Please sign up or login with your details

Forgot password? Click here to reset