Hyperparameter optimization of data-driven AI models on HPC systems

03/02/2022
by   Eric Wulff, et al.
0

In the European Center of Excellence in Exascale computing "Research on AI- and Simulation-Based Engineering at Exascale" (CoE RAISE), researchers develop novel, scalable AI technologies towards Exascale. This work exercises High Performance Computing resources to perform large-scale hyperparameter optimization using distributed training on multiple compute nodes. This is part of RAISE's work on data-driven use cases which leverages AI- and HPC cross-methods developed within the project. In response to the demand for parallelizable and resource efficient hyperparameter optimization methods, advanced hyperparameter search algorithms are benchmarked and compared. The evaluated algorithms, including Random Search, Hyperband and ASHA, are tested and compared in terms of both accuracy and accuracy per compute resources spent. As an example use case, a graph neural network model known as MLPF, developed for the task of Machine-Learned Particle-Flow reconstruction in High Energy Physics, acts as the base model for optimization. Results show that hyperparameter optimization significantly increased the performance of MLPF and that this would not have been possible without access to large-scale High Performance Computing resources. It is also shown that, in the case of MLPF, the ASHA algorithm in combination with Bayesian optimization gives the largest performance increase per compute resources spent out of the investigated algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2020

Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers

There is an ever-increasing need for computational power to train comple...
research
06/24/2020

Simple and Scalable Parallelized Bayesian Optimization

In recent years, leveraging parallel and distributed computational resou...
research
12/21/2022

End-to-end AI Framework for Hyperparameter Optimization, Model Training, and Interpretable Inference for Molecules and Crystals

We introduce an end-to-end computational framework that enables hyperpar...
research
07/22/2022

Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

Graph Convolutional Neural Network (GCNN) is a popular class of deep lea...
research
10/03/2022

HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization

Distributed data storage services tailored to specific applications have...
research
11/20/2020

Hyperparameter Optimization for AST Differencing

Computing the differences between two versions of the same program is an...

Please sign up or login with your details

Forgot password? Click here to reset