Transformed CNNs: recasting pre-trained convolutional layers with self-attention

06/10/2021
by   Stéphane d'Ascoli, et al.
0

Vision Transformers (ViT) have recently emerged as a powerful alternative to convolutional networks (CNNs). Although hybrid models attempt to bridge the gap between these two architectures, the self-attention layers they rely on induce a strong computational bottleneck, especially at large spatial resolutions. In this work, we explore the idea of reducing the time spent training these layers by initializing them as convolutional layers. This enables us to transition smoothly from any pre-trained CNN to its functionally identical hybrid model, called Transformed CNN (T-CNN). With only 50 epochs of fine-tuning, the resulting T-CNNs demonstrate significant performance gains over the CNN (+2.2 top-1 on ImageNet-1k for a ResNet50-RS) as well as substantially improved robustness (+11 the T-CNN, providing deeper insights into the fruitful interplay between convolutions and self-attention. Finally, we experiment initializing the T-CNN from a partially trained CNN, and find that it reaches better performance than the corresponding hybrid model trained from scratch, while reducing training time.

READ FULL TEXT

page 6

page 9

page 16

research
03/19/2021

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

Convolutional architectures have proven extremely successful for vision ...
research
05/16/2023

Mimetic Initialization of Self-Attention Layers

It is notoriously difficult to train Transformers on small datasets; typ...
research
06/05/2022

U(1) Symmetry-breaking Observed in Generic CNN Bottleneck Layers

We report on a significant discovery linking deep convolutional neural n...
research
02/28/2019

CircConv: A Structured Convolution with Low Complexity

Deep neural networks (DNNs), especially deep convolutional neural networ...
research
06/09/2021

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Transformers have attracted increasing interests in computer vision, but...
research
11/08/2019

On the Relationship between Self-Attention and Convolutional Layers

Recent trends of incorporating attention mechanisms in vision have led r...
research
03/17/2020

Hyperplane Arrangements of Trained ConvNets Are Biased

We investigate the geometric properties of the functions learned by trai...

Please sign up or login with your details

Forgot password? Click here to reset