What's Hidden in a One-layer Randomly Weighted Transformer?

09/08/2021
by   Sheng Shen, et al.
17

We demonstrate that, hidden within one-layer randomly weighted neural networks, there exist subnetworks that can achieve impressive performance, without ever modifying the weight initializations, on machine translation tasks. To find subnetworks for one-layer randomly weighted neural networks, we apply different binary masks to the same weight matrix to generate different layers. Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29.45/17.29 BLEU on IWSLT14/WMT14. Using a fixed pre-trained embedding layer, the previously found subnetworks are smaller than, but can match 98 Transformer small/base on IWSLT14/WMT14. Furthermore, we demonstrate the effectiveness of larger and deeper transformers in this setting, as well as the impact of different initialization methods. We released the source code at https://github.com/sIncerass/one_layer_lottery_ticket.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2019

What's Hidden in a Randomly Weighted Neural Network?

Training a neural network is synonymous with learning the values of the ...
research
12/30/2020

Reservoir Transformer

We demonstrate that transformers obtain impressive performance even when...
research
11/24/2021

Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks

Deep neural networks (DNNs) are so over-parametrized that recent researc...
research
12/27/2020

Learning Light-Weight Translation Models from Deep Transformer

Recently, deep models have shown tremendous improvements in neural machi...
research
10/13/2022

Parameter-Efficient Masking Networks

A deeper network structure generally handles more complicated non-linear...
research
02/26/2021

Layer-Wise Interpretation of Deep Neural Networks Using Identity Initialization

The interpretability of neural networks (NNs) is a challenging but essen...
research
09/11/2023

Efficient Finite Initialization for Tensorized Neural Networks

We present a novel method for initializing layers of tensorized neural n...

Please sign up or login with your details

Forgot password? Click here to reset