Performance Optimizations of Recursive Electronic Structure Solvers targeting Multi-Core Architectures (LA-UR-20-26665)

02/17/2021
by   Adetokunbo A. Adedoyin, et al.
0

As we rapidly approach the frontiers of ultra large computing resources, software optimization is becoming of paramount interest to scientific application developers interested in efficiently leveraging all available on-Node computing capabilities and thereby improving a requisite science per watt metric. The scientific application of interest here is the Basic Math Library (BML) that provides a singular interface for linear algebra operation frequently used in the Quantum Molecular Dynamics (QMD) community. The provisioning of a singular interface indicates the presence of an abstraction layer which in-turn suggests commonalities in the code-base and therefore any optimization or tuning introduced in the core of code-base has the ability to positively affect the performance of the aforementioned library as a whole. With that in mind, we proceed with this investigation by performing a survey of the entirety of the BML code-base, and extract, in form of micro-kernels, common snippets of code. We introduce several optimization strategies into these micro-kernels including 1.) Strength Reduction 2.) Memory Alignment for large arrays 3.) Non Uniform Memory Access (NUMA) aware allocations to enforce data locality and 4.) appropriate thread affinity and bindings to enhance the overall multi-threaded performance. After introducing these optimizations, we benchmark the micro-kernels and compare the run-time before and after optimization for several target architectures. Finally we use the results as a guide to propagating the optimization strategies into the BML code-base. As a demonstration, herein, we test the efficacy of these optimization strategies by comparing the benchmark and optimized versions of the code.

READ FULL TEXT

page 7

page 8

page 9

page 10

page 11

page 12

page 16

research
09/02/2022

Performance of the Vipera framework for DSLs on micro-core architectures

Vipera provides a compiler and runtime framework for implementing dynami...
research
06/30/2017

Applying the Polyhedral Model to Tile Time Loops in Devito

The run time of many scientific computation applications for numerical m...
research
08/25/2022

Understanding the Power of Evolutionary Computation for GPU Code Optimization

Achieving high performance for GPU codes requires developers to have sig...
research
03/22/2021

Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels

As hardware architectures are evolving in the push towards exascale, dev...
research
10/04/2020

High level programming abstractions for leveraging hierarchical memories with micro-core architectures

Micro-core architectures combine many low memory, low power computing co...
research
06/22/2021

High Performance Optimization at the Door of the Exascale

quest for processing speed potential. In fact, we always get a fraction ...
research
12/19/2018

AdaptMemBench: Application-Specific MemorySubsystem Benchmarking

Optimizing scientific applications to take full advan-tage of modern mem...

Please sign up or login with your details

Forgot password? Click here to reset