Differentiable Patch Selection for Image Recognition

04/07/2021
by   Jean-Baptiste Cordonnier, et al.
75

Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.

READ FULL TEXT

page 1

page 5

page 6

page 14

08/19/2021

Generating Superpixels for High-resolution Images with Decoupled Patch Calibration

Superpixel segmentation has recently seen important progress benefiting ...
11/29/2016

Weakly-supervised Discriminative Patch Learning via CNN for Fine-grained Recognition

Research on fine-grained recognition has recently shifted from multistag...
02/22/2022

Bag of Visual Words (BoVW) with Deep Features – Patch Classification Model for Limited Dataset of Breast Tumours

Currently, the computational complexity limits the training of high reso...
01/25/2021

Gigapixel Histopathological Image Analysis using Attention-based Neural Networks

Although CNNs are widely considered as the state-of-the-art models in va...
11/20/2018

Finding a Needle in the Haystack: Attention-Based Classification of High Resolution Microscopy Images

Deep learning for classification of microscopy images is challenging bec...
08/10/2022

PatchDropout: Economizing Vision Transformers Using Patch Dropout

Vision transformers have demonstrated the potential to outperform CNNs i...
09/10/2018

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks

We introduce a saliency-based distortion layer for convolutional neural ...