SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates

11/01/2022
by   Baixi Sun, et al.
0

CNN-based surrogates have become prevalent in scientific applications to replace conventional time-consuming physical approaches. Although these surrogates can yield satisfactory results with significantly lower computation costs over small training datasets, our benchmarking results show that data-loading overhead becomes the major performance bottleneck when training surrogates with large datasets. In practice, surrogates are usually trained with high-resolution scientific data, which can easily reach the terabyte scale. Several state-of-the-art data loaders are proposed to improve the loading throughput in general CNN training; however, they are sub-optimal when applied to the surrogate training. In this work, we propose SOLAR, a surrogate data loader, that can ultimately increase loading throughput during the training. It leverages our three key observations during the benchmarking and contains three novel designs. Specifically, SOLAR first generates a pre-determined shuffled index list and accordingly optimizes the global access order and the buffer eviction scheme to maximize the data reuse and the buffer hit rate. It then proposes a tradeoff between lightweight computational imbalance and heavyweight loading workload imbalance to speed up the overall training. It finally optimizes its data access pattern with HDF5 to achieve a better parallel I/O throughput. Our evaluation with three scientific surrogates and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch Data Loader and 3.52X speedup over state-of-the-art data loaders.

READ FULL TEXT
research
10/14/2022

A Fault Detection Scheme Utilizing Convolutional Neural Network for PV Solar Panels with High Accuracy

Solar energy is one of the most dependable renewable energy technologies...
research
01/01/2023

PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs

Dynamic Graph Neural Networks (DGNNs) have been broadly applied in vario...
research
07/08/2020

Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs

Rapid growth in scientific data and a widening gap between computational...
research
04/14/2023

A Comparative Study on Generative Models for High Resolution Solar Observation Imaging

Solar activity is one of the main drivers of variability in our solar sy...
research
10/10/2022

Comparing the carbon costs and benefits of low-resource solar nowcasting

Solar PV yield nowcasting is used to help anticipate peaks and troughs i...
research
06/18/2018

Segmentation of Photovoltaic Module Cells in Electroluminescence Images

High resolution electroluminescence (EL) images captured in the infrared...

Please sign up or login with your details

Forgot password? Click here to reset