Acceleration of multiple precision matrix multiplication based on multi-component floating-point arithmetic using AVX2

01/17/2021
by   Tomonori Kouya, et al.
0

In this paper, we report the results obtained from the acceleration of multi-binary64-type multiple precision matrix multiplication with AVX2. We target double-double (DD), triple-double (TD), and quad-double (QD) precision arithmetic designed by certain types of error-free transformation (EFT) arithmetic. Furthermore, we implement SIMDized EFT functions, which simultaneously compute with four binary64 numbers on x86_64 computing environment, and by using help of them, we also develop SIMDized DD, TD, and QD additions and multiplications. In addition, AVX2 load/store functions were adopted to efficiently speed up reading and storing matrix elements from/to memory. Owing to these combined techniques, our implemented multiple precision matrix multiplications have been accelerated more than three times compared with non-accelerated ones. Our accelerated matrix multiplication modifies the performance of parallelization with OpenMP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2015

Performance evaluation of multiple precision matrix multiplications using parallelized Strassen and Winograd algorithms

It is well known that Strassen and Winograd algorithms can reduce the co...
research
10/05/2017

Tuning Technique for Multiple Precision Dense Matrix Multiplication using Prediction of Computational Time

Although reliable long precision floating-point arithmetic libraries suc...
research
07/27/2021

Accelerated Multiple Precision Direct Method and Mixed Precision Iterative Refinement on Python Programming Environment

Current Python programming environment does not have any reliable and ef...
research
11/12/2015

GEMMbench: a framework for reproducible and collaborative benchmarking of matrix multiplication

The generic matrix-matrix multiplication (GEMM) is arguably the most pop...
research
03/08/2023

Cascading GEMM: High Precision from Low Precision

This paper lays out insights and opportunities for implementing higher-p...
research
01/09/2023

Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU

We introduce Stream-K, a work-centric parallelization of matrix multipli...
research
02/27/2021

Efficient Soft-Error Detection for Low-precision Deep Learning Recommendation Models

Soft error, namely silent corruption of signal or datum in a computer sy...

Please sign up or login with your details

Forgot password? Click here to reset