Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

11/15/2021
by   Huaijin Pi, et al.
7

Recently, self-attention operators have shown superior performance as a stand-alone building block for vision models. However, existing self-attention models are often hand-designed, modified from CNNs, and obtained by stacking one operator only. A wider range of architecture space which combines different self-attention operators and convolution is rarely explored. In this paper, we explore this novel architecture space with weight-sharing Neural Architecture Search (NAS) algorithms. The result architecture is named TrioNet for combining convolution, local self-attention, and global (axial) self-attention operators. In order to effectively search in this huge architecture space, we propose Hierarchical Sampling for better training of the supernet. In addition, we propose a novel weight-sharing strategy, Multi-head Sharing, specifically for multi-head self-attention operators. Our searched TrioNet that combines self-attention and convolution outperforms all stand-alone models with fewer FLOPs on ImageNet classification where self-attention performs better than convolution. Furthermore, on various small datasets, we observe inferior performance for self-attention models, but our TrioNet is still able to match the best operator, convolution in this case. Our code is available at https://github.com/phj128/TrioNet.

READ FULL TEXT

page 1

page 2

research
11/14/2021

Local Multi-Head Channel Self-Attention for Facial Expression Recognition

Since the Transformer architecture was introduced in 2017 there has been...
research
04/18/2020

Adaptive Attention Span in Computer Vision

Recent developments in Transformers for language modeling have opened ne...
research
01/07/2021

Self-Attention Based Context-Aware 3D Object Detection

Most existing point-cloud based 3D object detectors use convolution-like...
research
11/11/2019

A hybrid text normalization system using multi-head self-attention for mandarin

In this paper, we propose a hybrid text normalization system using multi...
research
12/03/2019

Multiscale Self Attentive Convolutions for Vision and Language Modeling

Self attention mechanisms have become a key building block in many state...
research
06/04/2021

X-volution: On the unification of convolution and self-attention

Convolution and self-attention are acting as two fundamental building bl...
research
12/28/2020

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention

In this paper, we propose a novel deep learning architecture to improvin...

Please sign up or login with your details

Forgot password? Click here to reset