MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations

12/18/2018
by   Bangtian Liu, et al.
0

We present MatRox, a novel model-based algorithm and implementation of Hierarchically Semi-Separable (HSS) matrix computations on parallel architectures. MatRox uses a novel storage format to improve data locality and scalability of HSS matrix-matrix multiplications on shared memory multicore processors. We build a performance model for HSS matrix-matrix multiplications. Based on the performance model, a mixed-rank heuristic is introduced to find an optimal HSS-tree depth for a faster HSS matrix evaluation. Uniform sampling is used to improve the performance of HSS compression. MatRox outperforms state-of-the-art HSS matrix multiplication codes, GOFMM and STRUMPACK, with average speedups of 2.8x and 6.1x respectively on target multicore processors.

READ FULL TEXT
research
12/23/2011

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in...
research
03/26/2023

Common Subexpression-based Compression and Multiplication of Sparse Constant Matrices

In deep learning inference, model parameters are pruned and quantized to...
research
03/26/2019

Matrix multiplication and universal scalability of the time on the Intel Scalable processors

Matrix multiplication is one of the core operations in many areas of sci...
research
04/11/2017

Strassen's Algorithm for Tensor Contraction

Tensor contraction (TC) is an important computational kernel widely used...
research
04/05/2018

High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitiv...
research
05/10/2022

The spatial computer: A model for energy-efficient parallel computation

We present a new parallel model of computation suitable for spatial arch...
research
07/07/2020

A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation

The QR algorithm is one of the three phases in the process of computing ...

Please sign up or login with your details

Forgot password? Click here to reset