FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs

05/03/2023
by   Shixun Wu, et al.
0

General matrix/matrix multiplication (GEMM) is crucial for scientific computing and machine learning. However, the increased scale of the computing platforms raises concerns about hardware and software reliability. In this poster, we present FT-GEMM, a high-performance GEMM being capable of tolerating soft errors on-the-fly. We incorporate the fault tolerant functionality at algorithmic level by fusing the memory-intensive operations into the GEMM assembly kernels. We design a cache-friendly scheme for parallel FT-GEMM. Experimental results on Intel Cascade Lake demonstrate that FT-GEMM offers high reliability and performance – faster than Intel MKL, OpenBLAS, and BLIS by 3.50%∼ 22.14% for both serial and parallel GEMM, even under hundreds of errors injected per minute.

READ FULL TEXT

page 1

page 2

research
04/02/2021

FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance

Basic Linear Algebra Subprograms (BLAS) is a core library in scientific ...
research
05/01/2023

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

General Matrix Multiplication (GEMM) is a crucial algorithm for various ...
research
02/09/2020

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors

Hardware platforms in high performance computing are constantly getting ...
research
03/26/2019

Matrix multiplication and universal scalability of the time on the Intel Scalable processors

Matrix multiplication is one of the core operations in many areas of sci...
research
04/02/2018

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

Architectures with multiple classes of memory media are becoming a commo...
research
09/01/2016

BLISlab: A Sandbox for Optimizing GEMM

Matrix-matrix multiplication is a fundamental operation of great importa...
research
03/21/2019

Fault-Tolerant Nanosatellite Computing on a Budget

Micro- and nanosatellites have become popular platforms for a variety of...

Please sign up or login with your details

Forgot password? Click here to reset