FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance

04/02/2021
by   Yujia Zhai, et al.
0

Basic Linear Algebra Subprograms (BLAS) is a core library in scientific computing and machine learning. This paper presents FT-BLAS, a new implementation of BLAS routines that not only tolerates soft errors on the fly, but also provides comparable performance to modern state-of-the-art BLAS libraries on widely-used processors such as Intel Skylake and Cascade Lake. To accommodate the features of BLAS, which contains both memory-bound and computing-bound routines, we propose a hybrid strategy to incorporate fault tolerance into our brand-new BLAS implementation: duplicating computing instructions for memory-bound Level-1 and Level-2 BLAS routines and incorporating an Algorithm-Based Fault Tolerance mechanism for computing-bound Level-3 BLAS routines. Our high performance and low overhead are obtained from delicate assembly-level optimization and a kernel-fusion approach to the computing kernels. Experimental results demonstrate that FT-BLAS offers high reliability and high performance – faster than Intel MKL, OpenBLAS, and BLIS by up to 3.50 three levels of BLAS we benchmarked, even under hundreds of errors injected per minute.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2023

FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs

General matrix/matrix multiplication (GEMM) is crucial for scientific co...
research
05/03/2021

A C++17 Thread Pool for High-Performance Scientific Computing

We present a modern C++17-compatible thread pool implementation, built f...
research
05/01/2023

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

General Matrix Multiplication (GEMM) is a crucial algorithm for various ...
research
04/19/2021

Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs

Neural networks (NNs) are increasingly employed in domains that require ...
research
05/16/2023

Case Study for Running Memory-Bound Kernels on RISC-V CPUs

The emergence of a new, open, and free instruction set architecture, RIS...
research
04/30/2018

Improving Performance of Iterative Methods by Lossy Checkponting

Iterative methods are commonly used approaches to solve large, sparse li...
research
03/27/2020

Algorithm-Based Fault Tolerance for Convolutional Neural Networks

Convolutional neural networks (CNNs) are becoming more and more importan...

Please sign up or login with your details

Forgot password? Click here to reset