DeepAI AI Chat
Log In Sign Up

An Efficient Vectorization Scheme for Stencil Computation

by   Kun Li, et al.

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data locality respectively. In this paper, the downsides of existing vectorization schemes are analyzed. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. Then we propose a novel transpose layout to preserve the data locality for tiling and reduce the data reorganization overhead for vectorization simultaneously. To further improve the data reuse at the register level, a time loop unroll-and-jam strategy is designed to perform multistep stencil computation along the time dimension. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.


page 1

page 2

page 3

page 4


Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations

Stencil computation is one of the most important kernels in various scie...

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural n...

Temporal Vectorization for Stencils

Stencil computations represent a very common class of nested loops in sc...

Gamify Stencil Dwarf on Cloud for Democratizing Scientific Computing

Stencil computation is one of the most important kernels in various scie...

Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering

Large scale agglomerative clustering is hindered by computational burden...

SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

Sequence alignment forms an important backbone in many sequencing applic...

Guidelines for enhancing data locality in selected machine learning algorithms

To deal with the complexity of the new bigger and more complex generatio...