Log In Sign Up

Deep is a Luxury We Don't Have

by   Ahmed Taha, et al.

Medical images come in high resolutions. A high resolution is vital for finding malignant tissues at an early stage. Yet, this resolution presents a challenge in terms of modeling long range dependencies. Shallow transformers eliminate this problem, but they suffer from quadratic complexity. In this paper, we tackle this complexity by leveraging a linear self-attention approximation. Through this approximation, we propose an efficient vision model called HCT that stands for High resolution Convolutional Transformer. HCT brings transformers' merits to high resolution images at a significantly lower cost. We evaluate HCT using a high resolution mammography dataset. HCT is significantly superior to its CNN counterpart. Furthermore, we demonstrate HCT's fitness for medical images by evaluating its effective receptive field.Code available at


page 2

page 13


Glance-and-Gaze Vision Transformer

Recently, there emerges a series of vision Transformers, which show supe...

StyleSwin: Transformer-based GAN for High-resolution Image Generation

Despite the tantalizing success in a broad of vision tasks, transformers...

Taming Transformers for High-Resolution Image Synthesis

Designed to learn long-range interactions on sequential data, transforme...

Memory transformers for full context and high-resolution 3D Medical Segmentation

Transformer models achieve state-of-the-art results for image segmentati...

ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions

We present ASSET, a neural architecture for automatically modifying an i...

Dual-Flattening Transformers through Decomposed Row and Column Queries for Semantic Segmentation

It is critical to obtain high resolution features with long range depend...

HUMUS-Net: Hybrid unrolled multi-scale network architecture for accelerated MRI reconstruction

In accelerated MRI reconstruction, the anatomy of a patient is recovered...