MULLER: Multilayer Laplacian Resizer for Vision

04/06/2023
by   Zhengzhong Tu, et al.
0

Image resizing operation is a fundamental preprocessing module in modern computer vision. Throughout the deep learning revolution, researchers have overlooked the potential of alternative resizing methods beyond the commonly used resizers that are readily available, such as nearest-neighbors, bilinear, and bicubic. The key question of our interest is whether the front-end resizer affects the performance of deep vision models? In this paper, we present an extremely lightweight multilayer Laplacian resizer with only a handful of trainable parameters, dubbed MULLER resizer. MULLER has a bandpass nature in that it learns to boost details in certain frequency subbands that benefit the downstream recognition models. We show that MULLER can be easily plugged into various training pipelines, and it effectively boosts the performance of the underlying vision task with little to no extra cost. Specifically, we select a state-of-the-art vision Transformer, MaxViT, as the baseline, and show that, if trained with MULLER, MaxViT gains up to 0.6 enjoys 36 ImageNet-1k, as compared to the standard training scheme. Notably, MULLER's performance also scales with model size and training data size such as ImageNet-21k and JFT, and it is widely applicable to multiple vision tasks, including image classification, object detection and segmentation, as well as image quality assessment.

READ FULL TEXT

page 8

page 15

page 16

page 17

research
03/17/2021

Learning to Resize Images for Computer Vision Tasks

For all the ways convolutional neural nets have revolutionized computer ...
research
04/04/2022

MaxViT: Multi-Axis Vision Transformer

Transformers have recently gained significant attention in the computer ...
research
03/15/2023

DeepMIM: Deep Supervision for Masked Image Modeling

Deep supervision, which involves extra supervisions to the intermediate ...
research
07/10/2017

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

The success of deep learning in vision can be attributed to: (a) models ...
research
11/18/2021

Swin Transformer V2: Scaling Up Capacity and Resolution

We present techniques for scaling Swin Transformer up to 3 billion param...
research
02/03/2023

Offloading Deep Learning Powered Vision Tasks from UAV to 5G Edge Server with Denoising

Offloading computationally heavy tasks from an unmanned aerial vehicle (...
research
05/03/2022

Better plain ViT baselines for ImageNet-1k

It is commonly accepted that the Vision Transformer model requires sophi...

Please sign up or login with your details

Forgot password? Click here to reset