Lessons on Parameter Sharing across Layers in Transformers

04/13/2021
by   Sho Takase, et al.
0

We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2023

Higher-Order Weakest Precondition Transformers via a CPS Transformation

Weakest precondition transformers are essential notions for program veri...
research
01/01/2021

Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers

The advent of the Transformer can arguably be described as a driving for...
research
10/31/2018

A task in a suit and a tie: paraphrase generation with semantic augmentation

Paraphrasing is rooted in semantics. We show the effectiveness of transf...
research
06/15/2023

Understanding Parameter Sharing in Transformers

Parameter sharing has proven to be a parameter-efficient approach. Previ...
research
08/25/2021

Dropout against Deep Leakage from Gradients

As the scale and size of the data increases significantly nowadays, fede...
research
04/07/2023

Applicable Methodologies for the Mass Transfer Phenomenon in Tumble Dryers: A Review

Tumble dryers offer a fast and convenient way of drying textiles indepen...
research
08/19/2022

[Re] Differentiable Spatial Planning using Transformers

This report covers our reproduction effort of the paper 'Differentiable ...

Please sign up or login with your details

Forgot password? Click here to reset