Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations

03/16/2021
by   Kun Li, et al.
0

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization techniques, aiming at exploiting the in-core data parallelism. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. In this paper, a novel transpose layout is devised to preserve the data locality for tiling in the data space and reduce the data reorganization overhead for vectorization simultaneously. We then propose an approach of temporal computation folding designed to further reduce the redundancy of arithmetic calculations by exploiting the register reuse, alleviating the increased register pressure, and deducing generalization with a linear regression model. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/16/2021

An Efficient Vectorization Scheme for Stencil Computation

Stencil computation is one of the most important kernels in various scie...
10/10/2020

Temporal Vectorization for Stencils

Stencil computations represent a very common class of nested loops in sc...
04/25/2019

Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms

Machine learning (ML) is probably the first and foremost used technique ...
01/09/2020

Guidelines for enhancing data locality in selected machine learning algorithms

To deal with the complexity of the new bigger and more complex generatio...
04/01/2021

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural n...
12/28/2021

Casper: Accelerating Stencil Computation using Near-cache Processing

Stencil computation is one of the most used kernels in a wide variety of...
04/27/2021

Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations

Computed Tomography (CT) is a key 3D imaging technology that fundamental...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.