The Evolved Transformer

01/30/2019
by   David R. So, et al.
0

Recent works have highlighted the strengths of the Transformer architecture for dealing with sequence tasks. At the same time, neural architecture search has advanced to the point where it can outperform human-designed models. The goal of this work is to use architecture search to find a better Transformer architecture. We first construct a large search space inspired by the recent advances in feed-forward sequential models and then run evolutionary architecture search, seeding our initial population with the Transformer. To effectively run this search on the computationally expensive WMT 2014 English-German translation task, we develop the progressive dynamic hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments - the Evolved Transformer - demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At big model size, the Evolved Transformer is twice as efficient as the Transformer in FLOPS without loss in quality. At a much smaller - mobile-friendly - model size of 7M parameters, the Evolved Transformer outperforms the Transformer by 0.7 BLEU on WMT'14 English-German.

READ FULL TEXT
research
05/31/2021

Memory-Efficient Differentiable Transformer Architecture Search

Differentiable architecture search (DARTS) is successfully applied in ma...
research
04/24/2020

Lite Transformer with Long-Short Range Attention

Transformer has become ubiquitous in natural language processing (e.g., ...
research
09/04/2020

AutoTrans: Automating Transformer Design via Reinforced Architecture Search

Though the transformer architectures have shown dominance in many natura...
research
06/05/2019

Learning Deep Transformer Models for Machine Translation

Transformer is the state-of-the-art model in recent machine translation ...
research
10/01/2019

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Neural sequence-to-sequence models, particularly the Transformer, are th...
research
06/01/2020

Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English

We conduct in this work an evaluation study comparing offline and online...
research
09/30/2021

SCIMAT: Science and Mathematics Dataset

In this work, we announce a comprehensive well curated and opensource da...

Please sign up or login with your details

Forgot password? Click here to reset