Learning to ignore: rethinking attention in CNNs

11/10/2021
by   Firas Laakom, et al.
77

Recently, there has been an increasing interest in applying attention mechanisms in Convolutional Neural Networks (CNNs) to solve computer vision tasks. Most of these methods learn to explicitly identify and highlight relevant parts of the scene and pass the attended image to further layers of the network. In this paper, we argue that such an approach might not be optimal. Arguably, explicitly learning which parts of the image are relevant is typically harder than learning which parts of the image are less relevant and, thus, should be ignored. In fact, in vision domain, there are many easy-to-identify patterns of irrelevant features. For example, image regions close to the borders are less likely to contain useful information for a classification task. Based on this idea, we propose to reformulate the attention mechanism in CNNs to learn to ignore instead of learning to attend. Specifically, we propose to explicitly learn irrelevant information in the scene and suppress it in the produced representation, keeping only important attributes. This implicit attention scheme can be incorporated into any existing attention mechanism. In this work, we validate this idea using two recent attention methods Squeeze and Excitation (SE) block and Convolutional Block Attention Module (CBAM). Experimental results on different datasets and model architectures show that learning to ignore, i.e., implicit attention, yields superior performance compared to the standard approaches.

READ FULL TEXT
research
09/16/2022

ConvFormer: Closing the Gap Between CNN and Vision Transformers

Vision transformers have shown excellent performance in computer vision ...
research
04/26/2020

Hyperspectral image classification based on multi-scale residual network with attention mechanism

Compared with traditional machine learning methods, deep learning method...
research
10/14/2022

Parameter-Free Average Attention Improves Convolutional Neural Network Performance (Almost) Free of Charge

Visual perception is driven by the focus on relevant aspects in the surr...
research
07/21/2022

Irrelevant Pixels are Everywhere: Find and Exclude Them for More Efficient Computer Vision

Computer vision is often performed using Convolutional Neural Networks (...
research
06/15/2022

Self-Supervised Implicit Attention: Guided Attention by The Model Itself

We propose Self-Supervised Implicit Attention (SSIA), a new approach tha...
research
07/12/2020

Learning Frame Level Attention for Environmental Sound Classification

Environmental sound classification (ESC) is a challenging problem due to...
research
06/30/2023

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Vision transformers (ViTs) have significantly changed the computer visio...

Please sign up or login with your details

Forgot password? Click here to reset