Reservoir Transformer

12/30/2020
by   Sheng Shen, et al.
23

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2021

What's Hidden in a One-layer Randomly Weighted Transformer?

We demonstrate that, hidden within one-layer randomly weighted neural ne...
research
09/28/2020

Deep Transformers with Latent Depth

The Transformer model has achieved state-of-the-art performance in many ...
research
12/24/2022

Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation

In this paper, we study the use of deep Transformer translation model fo...
research
02/14/2023

Energy Transformer

Transformers have become the de facto models of choice in machine learni...
research
10/16/2019

Injecting Hierarchy with U-Net Transformers

The Transformer architecture has become increasingly popular over the pa...
research
05/07/2019

Performance boost of time-delay reservoir computing by non-resonant clock cycle

The time-delay-based reservoir computing setup has seen tremendous succe...

Please sign up or login with your details

Forgot password? Click here to reset