An Efficient Vectorization Scheme for Stencil Computation

03/16/2021
by   Kun Li, et al.
0

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data locality respectively. In this paper, the downsides of existing vectorization schemes are analyzed. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. Then we propose a novel transpose layout to preserve the data locality for tiling and reduce the data reorganization overhead for vectorization simultaneously. To further improve the data reuse at the register level, a time loop unroll-and-jam strategy is designed to perform multistep stencil computation along the time dimension. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/16/2021

Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations

Stencil computation is one of the most important kernels in various scie...
04/01/2021

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural n...
10/10/2020

Temporal Vectorization for Stencils

Stencil computations represent a very common class of nested loops in sc...
01/16/2013

Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering

Large scale agglomerative clustering is hindered by computational burden...
02/11/2018

Locality Optimized Unstructured Mesh Algorithms on GPUs

Unstructured-mesh based numerical algorithms such as finite volume and f...
01/09/2020

Guidelines for enhancing data locality in selected machine learning algorithms

To deal with the complexity of the new bigger and more complex generatio...
04/27/2021

Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations

Computed Tomography (CT) is a key 3D imaging technology that fundamental...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.