GLiT: Neural Architecture Search for Global and Local Image Transformer

07/07/2021
by   Boyu Chen, et al.
0

We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition. Recently, transformers without CNN-based backbones are found to achieve impressive performance for image recognition. However, the transformer is designed for NLP tasks and thus could be sub-optimal when directly used for image recognition. In order to improve the visual representation ability for transformers, we propose a new search space and searching algorithm. Specifically, we introduce a locality module that models the local correlations in images explicitly with fewer computational cost. With the locality module, our search space is defined to let the search algorithm freely trade off between global and local information as well as optimizing the low-level design choice in each module. To tackle the problem caused by huge search space, a hierarchical neural architecture search method is proposed to search the optimal vision transformer from two levels separately with the evolutionary algorithm. Extensive experiments on the ImageNet dataset demonstrate that our method can find more discriminative and efficient transformer variants than the ResNet family (e.g., ResNet101) and the baseline ViT for image classification.

READ FULL TEXT
research
07/28/2022

Neural Architecture Search on Efficient Transformers and Beyond

Recently, numerous efficient Transformers have been proposed to reduce t...
research
04/12/2021

Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search

Recently neural architecture search(NAS) has been successfully used in i...
research
12/16/2020

AutoCaption: Image Captioning with Neural Architecture Search

Image captioning transforms complex visual information into abstract nat...
research
10/21/2021

3D-ANAS v2: Grafting Transformer Module on Automatically Designed ConvNet for Hyperspectral Image Classification

Hyperspectral image (HSI) classification has been a hot topic for decide...
research
12/07/2021

RSBNet: One-Shot Neural Architecture Search for A Backbone Network in Remote Sensing Image Recognition

Recently, a massive number of deep learning based approaches have been s...
research
03/14/2020

Efficient Backbone Search for Scene Text Recognition

Scene text recognition (STR) is very challenging due to the diversity of...
research
11/18/2019

ImmuNeCS: Neural Committee Search by an Artificial Immune System

Current Neural Architecture Search techniques can suffer from a few shor...

Please sign up or login with your details

Forgot password? Click here to reset