SNIPER: Efficient Multi-Scale Training

05/23/2018
by   Bharat Singh, et al.
0

We present SNIPER, an algorithm for performing efficient multi-scale training in instance level visual recognition tasks. Instead of processing every pixel in an image pyramid, SNIPER processes context regions around ground-truth instances (referred to as chips) at the appropriate scale. For background sampling, these context-regions are generated using proposals extracted from a region proposal network trained with a short learning schedule. Hence, the number of chips generated per image during training adaptively changes based on the scene complexity. SNIPER only processes 30 commonly used single scale training at 800x1333 pixels on the COCO dataset. But, it also observes samples from extreme resolutions of the image pyramid, like 1400x2000 pixels. As SNIPER operates on resampled low resolution chips (512x512 pixels), it can have a batch size as large as 20 on a single GPU even with a ResNet-101 backbone. Therefore it can benefit from batch-normalization during training without the need for synchronizing batch-normalization statistics across GPUs. SNIPER brings training of instance level recognition tasks like object detection closer to the protocol for image classification and suggests that the commonly accepted guideline that it is important to train on high resolution images for instance level visual recognition tasks might not be correct. Our implementation based on Faster-RCNN with a ResNet-101 backbone obtains an mAP of 47.6 process 5 images per second with a single GPU.

READ FULL TEXT

page 3

page 4

research
12/04/2018

AutoFocus: Efficient Multi-Scale Inference

This paper describes AutoFocus, an efficient multi-scale inference algor...
research
09/08/2023

On the Efficacy of Multi-scale Data Samplers for Vision Applications

Multi-scale resolution training has seen an increased adoption across mu...
research
08/20/2019

Consistent Scale Normalization for Object Recognition

Scale variation remains a challenge problem for object detection. Common...
research
02/10/2021

Scale Normalized Image Pyramids with AutoFocus for Object Detection

We present an efficient foveal framework to perform object detection. A ...
research
02/18/2022

MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery

Visual Place Recognition (VPR) is a crucial component of 6-DoF localizat...
research
01/09/2022

Glance and Focus Networks for Dynamic Visual Recognition

Spatial redundancy widely exists in visual recognition tasks, i.e., disc...
research
10/20/2022

Large-batch Optimization for Dense Visual Predictions

Training a large-scale deep neural network in a large-scale dataset is c...

Please sign up or login with your details

Forgot password? Click here to reset