Memory-Efficient Differentiable Transformer Architecture Search

05/31/2021
by   Yuekai Zhao, et al.
0

Differentiable architecture search (DARTS) is successfully applied in many vision tasks. However, directly using DARTS for Transformers is memory-intensive, which renders the search process infeasible. To this end, we propose a multi-split reversible network and combine it with DARTS. Specifically, we devise a backpropagation-with-reconstruction algorithm so that we only need to store the last layer's outputs. By relieving the memory burden for DARTS, it allows us to search with larger hidden size and more candidate operations. We evaluate the searched architecture on three sequence-to-sequence datasets, i.e., WMT'14 English-German, WMT'14 English-French, and WMT'14 English-Czech. Experimental results show that our network consistently outperforms standard Transformers across the tasks. Moreover, our method compares favorably with big-size Evolved Transformers, reducing search computation by an order of magnitude.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2019

The Evolved Transformer

Recent works have highlighted the strengths of the Transformer architect...
research
06/25/2021

Vision Transformer Architecture Search

Recently, transformers have shown great superiority in solving computer ...
research
08/01/2022

Partial Connection Based on Channel Attention for Differentiable Neural Architecture Search

Differentiable neural architecture search (DARTS), as a gradient-guided ...
research
08/04/2018

Teacher Guided Architecture Search

Strong improvements in network performance in vision tasks have resulted...
research
08/10/2020

RARTS: a Relaxed Architecture Search Method

Differentiable architecture search (DARTS) is an effective method for da...
research
07/12/2019

PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search

Differentiable architecture search (DARTS) provided a fast solution in f...
research
07/07/2021

Differentiable Random Access Memory using Lattices

We introduce a differentiable random access memory module with O(1) perf...

Please sign up or login with your details

Forgot password? Click here to reset