AutoTrans: Automating Transformer Design via Reinforced Architecture Search

09/04/2020
by   Wei Zhu, et al.
0

Though the transformer architectures have shown dominance in many natural language understanding tasks, there are still unsolved issues for the training of transformer models, especially the need for a principled way of warm-up which has shown importance for stable training of a transformer, as well as whether the task at hand prefer to scale the attention product or not. In this paper, we empirically explore automating the design choices in the transformer model, i.e., how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand. RL is employed to navigate along search space, and special parameter sharing strategies are designed to accelerate the search. It is shown that sampling a proportion of training data per epoch during search help to improve the search quality. Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers. In particular, we show that our learned model can be trained more robustly with large learning rates without warm-up.

READ FULL TEXT
research
10/14/2022

AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers

Neural architecture search (NAS) has demonstrated promising results on i...
research
01/30/2019

The Evolved Transformer

Recent works have highlighted the strengths of the Transformer architect...
research
05/25/2022

BiT: Robustly Binarized Multi-distilled Transformer

Modern pre-trained transformers have rapidly advanced the state-of-the-a...
research
10/12/2022

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

This paper studies the curious phenomenon for machine learning models wi...
research
12/16/2021

Trees in transformers: a theoretical analysis of the Transformer's ability to represent trees

Transformer networks are the de facto standard architecture in natural l...
research
08/09/2021

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Neural painting refers to the procedure of producing a series of strokes...
research
03/23/2022

Training-free Transformer Architecture Search

Recently, Vision Transformer (ViT) has achieved remarkable success in se...

Please sign up or login with your details

Forgot password? Click here to reset