Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS

04/03/2017
by   Istvan Z Reguly, et al.
0

The key common bottleneck in most stencil codes is data movement, and prior research has shown that improving data locality through optimisations that schedule across loops do particularly well. However, in many large PDE applications it is not possible to apply such optimisations through compilers because there are many options, execution paths and data per grid point, many dependent on run-time parameters, and the code is distributed across different compilation units. In this paper, we adapt the data locality improving optimisation called iteration space slicing for use in large OPS applications both in shared-memory and distributed-memory systems, relying on run-time analysis and delayed execution. We evaluate our approach on a number of applications, observing speedups of 2× on the Cloverleaf 2D/3D proxy application, which contain 83/141 loops respectively, 3.5× on the linear solver TeaLeaf, and 1.7× on the compressible Navier-Stokes solver OpenSBLI. We demonstrate strong and weak scalability up to 4608 cores of CINECA's Marconi supercomputer. We also evaluate our algorithms on Intel's Knights Landing, demonstrating maintained throughput as the problem size grows beyond 16GB, and we do scaling studies up to 8704 cores. The approach is generally applicable to any stencil DSL that provides per loop data access information.

READ FULL TEXT
research
06/07/2019

Towards Run Time Estimation of the Gaussian Chemistry Code for SEAGrid Science Gateway

Accurate estimation of the run time of computational codes has a number ...
research
08/10/2017

Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modelling

Sparse tiling is a technique to fuse loops that access common data, thus...
research
06/30/2017

Applying the Polyhedral Model to Tile Time Loops in Devito

The run time of many scientific computation applications for numerical m...
research
08/02/2018

Synapse: Synthetic Application Profiler and Emulator

Motivated by the need to emulate workload execution characteristics on h...
research
10/25/2022

Specialization of Run-time Configuration Space at Compile-time: An Exploratory Study

Numerous software systems are highly configurable through run-time optio...
research
02/18/2022

Migration-Based Synchronization

A fundamental challenge in multi- and many-core systems is the correct e...

Please sign up or login with your details

Forgot password? Click here to reset