Multi-level analysis of compiler induced variability and performance tradeoffs
Floating-point arithmetic is the computational foundation of numerical scientific software. Compiler optimizations that affect floating-point arithmetic can have a significant impact on the integrity, reproducibility, and performance of HPC scientific applications. Unfortunately the interplay between the compiler-induced variability and runtime performance from compiler optimizations is not well understood by programmers, a problem that is aggravated by the lack of analysis tools in this domain. In this paper, we present a novel set of techniques, as part of a multi-level analysis, that allow programmers to automatically search the space of compiler-induced variability and performance using their own inputs and metrics; these techniques allow programmers to pinpoint the root-cause of non-reproducible behavior to function granularity across multiple compilers and platforms using a bisection algorithm. We have demonstrated our methods on real-world code bases. We provide a performance and reproducibility analysis on the MFEM library as well as a study of compiler characterization by attempting to isolate all 1,086 found instances of result variability. The Laghos proxy app is analyzed and a significant divergent floating-point variability is identified in their code base. Our bisect algorithm pinpointed the problematic function with as little as 14 program executions. Furthermore, an evaluation with 4,376 controlled injections of floating-point perturbations on the LULESH proxy application, found that our framework is 100 file and function location of the injected problem with an average of 15 program executions.
READ FULL TEXT