Performance Optimization using Multimodal Modeling and Heterogeneous GNN

04/25/2023
by   Akash Dutta, et al.
0

Growing heterogeneity and configurability in HPC architectures has made auto-tuning applications and runtime parameters on these systems very complex. Users are presented with a multitude of options to configure parameters. In addition to application specific solutions, a common approach is to use general purpose search strategies, which often might not identify the best configurations or their time to convergence is a significant barrier. There is, thus, a need for a general purpose and efficient tuning approach that can be easily scaled and adapted to various tuning tasks. We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. To this end, we propose the Multimodal Graph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learning based approach that adapts Heterogeneous Graph Neural Networks and Denoizing Autoencoders for modeling IR-based code representations that serve as separate modalities. This approach is used as part of our pipeline to model a syntax, semantics, and structure-aware IR-based code representation for tuning parallel code regions/kernels. We extensively experiment on OpenMP and OpenCL code regions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMD SDK, NPB, NVIDIA SDK, Parboil, SHOC, and LULESH benchmarks. We apply our multimodal learning techniques to the tasks of i) optimizing the number of threads, scheduling policy and chunk size in OpenMP loops and, ii) identifying the best device for heterogeneous device mapping of OpenCL kernels. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

To enable heterogeneous computing systems with autonomous programming an...
research
01/16/2023

PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks

Relational graph neural networks (RGNNs) are graph neural networks (GNNs...
research
02/15/2018

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Efficient implementations of HPC applications for parallel architectures...
research
03/01/2022

Learning Intermediate Representations using Graph Neural Networks for NUMA and Prefetchers Optimization

There is a large space of NUMA and hardware prefetcher configurations th...
research
08/30/2020

Performance portability through machine learning guided kernel selection in SYCL libraries

Automatically tuning parallel compute kernels allows libraries and frame...
research
07/27/2021

Yet Another Combination of IR- and Neural-based Comment Generation

Code comment generation techniques aim to generate natural language desc...
research
03/23/2020

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

The increasing complexity of computing systems places a tremendous burde...

Please sign up or login with your details

Forgot password? Click here to reset