Studying inductive biases in image classification task

10/31/2022
by   Nana Arizumi, et al.
0

Recently, self-attention (SA) structures became popular in computer vision fields. They have locally independent filters and can use large kernels, which contradicts the previously popular convolutional neural networks (CNNs). CNNs success was attributed to the hard-coded inductive biases of locality and spatial invariance. However, recent studies have shown that inductive biases in CNNs are too restrictive. On the other hand, the relative position encodings, similar to depthwise (DW) convolution, are necessary for the local SA networks, which indicates that the SA structures are not entirely spatially variant. Hence, we would like to determine which part of inductive biases contributes to the success of the local SA structures. To do so, we introduced context-aware decomposed attention (CADA), which decomposes attention maps into multiple trainable base kernels and accumulates them using context-aware (CA) parameters. This way, we could identify the link between the CNNs and SA networks. We conducted ablation studies using the ResNet50 applied to the ImageNet classification task. DW convolution could have a large locality without increasing computational costs compared to CNNs, but the accuracy saturates with larger kernels. CADA follows this characteristic of locality. We showed that context awareness was the crucial property; however, large local information was not necessary to construct CA parameters. Even though no spatial invariance makes training difficult, more relaxed spatial invariance gave better accuracy than strict spatial invariance. Also, additional strong spatial invariance through relative position encoding was preferable. We extended these experiments to filters for downsampling and showed that locality bias is more critical for downsampling but can remove the strong locality bias using relaxed spatial invariance.

READ FULL TEXT

page 19

page 20

page 23

page 24

research
10/29/2021

Gabor filter incorporated CNN for compression

Convolutional neural networks (CNNs) are remarkably successful in many c...
research
02/07/2020

Revisiting Spatial Invariance with Low-Rank Local Connectivity

Convolutional neural networks are among the most successful architecture...
research
10/04/2022

Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling

There are two de facto standard architectures in recent computer vision:...
research
06/11/2023

2-D SSM: A General Spatial Layer for Visual Transformers

A central objective in computer vision is to design models with appropri...
research
05/15/2023

Theoretical Analysis of Inductive Biases in Deep Convolutional Networks

In this paper, we study the inductive biases in convolutional neural net...
research
12/27/2021

Vision Transformer for Small-Size Datasets

Recently, the Vision Transformer (ViT), which applied the transformer st...

Please sign up or login with your details

Forgot password? Click here to reset