Patch Gradient Descent: Training Neural Networks on Very Large Images

01/31/2023
by   Deepak K. Gupta, et al.
0

Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image at a time, ensuring that the majority of it is covered over the course of iterations. PatchGD thus extensively enjoys better memory and compute efficiency when training models on large scale images. PatchGD is thoroughly evaluated on two datasets - PANDA and UltraMNIST with ResNet50 and MobileNetV2 models under different memory constraints. Our evaluation clearly shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images, and especially when the compute memory is limited.

READ FULL TEXT

page 1

page 13

page 14

research
02/13/2022

Efficient Natural Gradient Descent Methods for Large-Scale Optimization Problems

We propose an efficient numerical method for computing natural gradient ...
research
04/28/2019

Deep Neuroevolution of Recurrent and Discrete World Models

Neural architectures inspired by our own human cognitive system, such as...
research
04/20/2017

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Training convolutional networks (CNN's) that fit on a single GPU with mi...
research
05/03/2019

Processing Megapixel Images with Deep Attention-Sampling Models

Existing deep architectures cannot operate on very large signals such as...
research
01/06/2020

Scalable Estimation and Inference with Large-scale or Online Survival Data

With the rapid development of data collection and aggregation technologi...
research
02/24/2020

Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent

Continuous-time mirror descent (CMD) can be seen as the limit case of th...
research
04/14/2022

RankNEAT: Outperforming Stochastic Gradient Search in Preference Learning Tasks

Stochastic gradient descent (SGD) is a premium optimization method for t...

Please sign up or login with your details

Forgot password? Click here to reset