Exploring shared memory architectures for end-to-end gigapixel deep learning

04/24/2023
by   Lucas W. Remedios, et al.
0

Deep learning has made great strides in medical imaging, enabled by hardware advances in GPUs. One major constraint for the development of new models has been the saturation of GPU memory resources during training. This is especially true in computational pathology, where images regularly contain more than 1 billion pixels. These pathological images are traditionally divided into small patches to enable deep learning due to hardware limitations. In this work, we explore whether the shared GPU/CPU memory architecture on the M1 Ultra systems-on-a-chip (SoCs) recently released by Apple, Inc. may provide a solution. These affordable systems (less than $5000) provide access to 128 GB of unified memory (Mac Studio with M1 Ultra SoC). As a proof of concept for gigapixel deep learning, we identified tissue from background on gigapixel areas from whole slide images (WSIs). The model was a modified U-Net (4492 parameters) leveraging large kernels and high stride. The M1 Ultra SoC was able to train the model directly on gigapixel images (16000×64000 pixels, 1.024 billion pixels) with a batch size of 1 using over 100 GB of unified memory for the process at an average speed of 1 minute and 21 seconds per batch with Tensorflow 2/Keras. As expected, the model converged with a high Dice score of 0.989 ± 0.005. Training up until this point took 111 hours and 24 minutes over 4940 steps. Other high RAM GPUs like the NVIDIA A100 (largest commercially accessible at 80 GB, ∼$15000) are not yet widely available (in preview for select regions on Amazon Web Services at $40.96/hour as a group of 8). This study is a promising step towards WSI-wise end-to-end deep learning with prevalent network architectures.

READ FULL TEXT

page 2

page 3

research
01/08/2019

CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers

Deep learning models are trained on servers with many GPUs, and training...
research
11/11/2020

Understanding Training Efficiency of Deep Learning Recommendation Models at Scale

The use of GPUs has proliferated for machine learning workflows and is n...
research
04/16/2015

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

We present Caffe con Troll (CcT), a fully compatible end-to-end version ...
research
08/19/2020

A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs

We propose ParDNN, an automatic, generic, and non-intrusive partitioning...
research
04/19/2020

Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms

The widely-adopted practice is to train deep learning models with specia...
research
10/29/2020

Ink Marker Segmentation in Histopathology Images Using Deep Learning

Due to the recent advancements in machine vision, digital pathology has ...
research
11/18/2017

DLTK: State of the Art Reference Implementations for Deep Learning on Medical Images

We present DLTK, a toolkit providing baseline implementations for effici...

Please sign up or login with your details

Forgot password? Click here to reset