Supplementary Material: Implementation and Experiments for GAU-based Model

05/12/2022
by   Zhenjie Liu, et al.
0

In February this year Google proposed a new Transformer variant called FLASH, which has a faster speed, lower VRAM footprint and better performance. This is achieved by designing a performant layer named GAU (Gated Attention Unit), which combines the Attention layer and FFN. In this paper, some implementation details are re-analyzed both theoretically and practically. We then propose a novel GAU-based model and pre-train it on a Chinese corpus. Results of the CLUE benchmark show that our model achieves a dev average score of 75.02, 1 than RoFormerV1 and being 45 RoFormerV2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2023

Neuromodulation Gated Transformer

We introduce a novel architecture, the Neuromodulation Gated Transformer...
research
10/31/2019

Attention Is All You Need for Chinese Word Segmentation

This paper presents a fast and accurate Chinese word segmentation (CWS) ...
research
03/06/2022

PanFormer: a Transformer Based Model for Pan-sharpening

Pan-sharpening aims at producing a high-resolution (HR) multi-spectral (...
research
10/11/2021

WeTS: A Benchmark for Translation Suggestion

Translation Suggestion (TS), which provides alternatives for specific wo...
research
05/12/2021

Building a Question and Answer System for News Domain

This project attempts to build a Question- Answering system in the News ...
research
04/20/2020

Gated Convolutional Bidirectional Attention-based Model for Off-topic Spoken Response Detection

Off-topic spoken response detection, the task aiming at assessing whethe...
research
11/06/2021

Convolutional Gated MLP: Combining Convolutions gMLP

To the best of our knowledge, this is the first paper to introduce Convo...

Please sign up or login with your details

Forgot password? Click here to reset