Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

05/15/2021
by   Fenglin Liu, et al.
53

Skip connection, is a widely-used technique to improve the performance and the convergence of deep neural networks, which is believed to relieve the difficulty in optimization due to non-linearity by propagating a linear component through the neural network layers. However, from another point of view, it can also be seen as a modulating mechanism between the input and the output, with the input scaled by a pre-defined value one. In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could be addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection. Inspired by the findings, we further propose to adaptively adjust the scale of the input by recursively applying skip connection with layer normalization, which promotes the performance substantially and generalizes well across diverse tasks including both machine translation and image classification datasets.

READ FULL TEXT
research
06/10/2020

Is the Skip Connection Provable to Reform the Neural Network Loss Landscape?

The residual network is now one of the most effective structures in deep...
research
07/04/2023

Free energy of Bayesian Convolutional Neural Network with Skip Connection

Since the success of Residual Network(ResNet), many of architectures of ...
research
10/23/2017

Investigating the feature collection for semantic segmentation via single skip connection

Since the study of deep convolutional neural network became prevalent, o...
research
08/29/2019

Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention

The general trend in NLP is towards increasing model capacity and perfor...
research
11/07/2018

Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

We introduce a principled approach, requiring only mild assumptions, for...
research
10/09/2022

SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction

In light of the smoothness property brought by skip connections in ResNe...
research
10/21/2019

Universal flow approximation with deep residual networks

Residual networks (ResNets) are a deep learning architecture with the re...

Please sign up or login with your details

Forgot password? Click here to reset