Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations

03/16/2021
by   Kun Li, et al.
0

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization techniques, aiming at exploiting the in-core data parallelism. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. In this paper, a novel transpose layout is devised to preserve the data locality for tiling in the data space and reduce the data reorganization overhead for vectorization simultaneously. We then propose an approach of temporal computation folding designed to further reduce the redundancy of arithmetic calculations by exploiting the register reuse, alleviating the increased register pressure, and deducing generalization with a linear regression model. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2021

An Efficient Vectorization Scheme for Stencil Computation

Stencil computation is one of the most important kernels in various scie...
research
10/10/2020

Temporal Vectorization for Stencils

Stencil computations represent a very common class of nested loops in sc...
research
04/25/2019

Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms

Machine learning (ML) is probably the first and foremost used technique ...
research
01/09/2020

Guidelines for enhancing data locality in selected machine learning algorithms

To deal with the complexity of the new bigger and more complex generatio...
research
04/01/2021

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural n...
research
01/23/2023

SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

Sequence alignment forms an important backbone in many sequencing applic...
research
05/18/2017

Spin Summations: A High-Performance Perspective

Besides tensor contractions, one of the most pronounced computational bo...

Please sign up or login with your details

Forgot password? Click here to reset