FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

11/07/2020
by   Md. Khaledur Rahman, et al.
0

We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparse-dense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches. FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance of FusedMM comes from the low-level vectorized kernels, a suitable load balancing scheme and an efficient utilization of the memory bandwidth. FusedMM can tune its performance using a code generator and perform equally well on Intel, AMD and ARM processors. FusedMM speeds up an end-to-end graph embedding algorithm by up to 28x on different processors.

READ FULL TEXT

page 6

page 9

research
11/18/2019

General Matrix-Matrix Multiplication Using SIMD features of the PIII

Generalised matrix-matrix multiplication forms the kernel of many mathem...
research
03/26/2019

Matrix multiplication and universal scalability of the time on the Intel Scalable processors

Matrix multiplication is one of the core operations in many areas of sci...
research
03/15/2022

Distributed-Memory Sparse Kernels for Machine Learning

Sampled Dense Times Dense Matrix Multiplication (SDDMM) and Sparse Times...
research
04/27/2023

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

This paper advocates for an intertwined design of the dense linear algeb...
research
07/14/2021

A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU

The Word Movers Distance (WMD) measures the semantic dissimilarity betwe...
research
03/14/2023

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization

Performance optimization is an increasingly challenging but often repeti...
research
06/30/2021

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) ar...

Please sign up or login with your details

Forgot password? Click here to reset