Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations

03/16/2021
by   Kun Li, et al.
0

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization techniques, aiming at exploiting the in-core data parallelism. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. In this paper, a novel transpose layout is devised to preserve the data locality for tiling in the data space and reduce the data reorganization overhead for vectorization simultaneously. We then propose an approach of temporal computation folding designed to further reduce the redundancy of arithmetic calculations by exploiting the register reuse, alleviating the increased register pressure, and deducing generalization with a linear regression model. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/16/2021

An Efficient Vectorization Scheme for Stencil Computation

Stencil computation is one of the most important kernels in various scie...
10/10/2020

Temporal Vectorization for Stencils

Stencil computations represent a very common class of nested loops in sc...
04/25/2019

Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms

Machine learning (ML) is probably the first and foremost used technique ...
01/09/2020

Guidelines for enhancing data locality in selected machine learning algorithms

To deal with the complexity of the new bigger and more complex generatio...
04/01/2021

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural n...
01/23/2023

SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

Sequence alignment forms an important backbone in many sequencing applic...
05/18/2017

Spin Summations: A High-Performance Perspective

Besides tensor contractions, one of the most pronounced computational bo...

Please sign up or login with your details

Forgot password? Click here to reset