Layer Based Partition for Matrix Multiplication on Heterogeneous Processor Platforms

12/15/2018
by   Yang Liu, et al.
0

While many approaches have been proposed to analyze the problem of matrix multiplication parallel computing, few of them address the problem on heterogeneous processor platforms. It still remains an open question on heterogeneous processor platforms to find the optimal schedule that balances the load within the heterogeneous processor set while minimizing the amount of communication. A great many studies are based on rectangular partition, whereas the optimality of rectangular partition as the basis has not been well justified. In this paper, we propose a new method that schedules matrix multiplication on heterogeneous processor platforms with the mixed co-design goal of minimizing the total communication volume and the multiplication completion time. We first present the schema of our layer based partition (LBP) method. Subsequently, we demonstrate that our approach guarantees minimal communication volume, which is smaller than what rectangular partition can reach. We further analyze the problem of minimizing the task completion time, with network topologies taken into account. We solve this problem in both single-neighbor network case and multi-neighbor network case. In single-neighbor network cases, we propose an equality based method to solve LBP, and simulation shows that the total communication volume is reduced by 75 from the lower bound of rectangular partition. In multi-neighbor network cases, we formulate LBP as a Mixed Integer Programming problem, and reduce the total communication volume by 81 promising perspective of tackling matrix multiplication problems on heterogeneous processor platforms.

READ FULL TEXT

page 7

page 9

page 18

research
08/26/2019

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication

We propose COSMA: a parallel matrix-matrix multiplication algorithm that...
research
06/03/2022

Root of Unity for Secure Distributed Matrix Multiplication: Grid Partition Case

We consider the problem of secure distributed matrix multiplication (SDM...
research
11/13/2019

Improving the Space-Time Efficiency of Processor-Oblivious Matrix Multiplication Algorithms

Classic cache-oblivious parallel matrix multiplication algorithms achiev...
research
07/17/2023

Optimizing Distributed Tensor Contractions using Node-Aware Processor Grids

We propose an algorithm that aims at minimizing the inter-node communica...
research
04/06/2017

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most suc...
research
04/19/2023

Baugh-Wooley Multiplication for the RISCV Processor

This article describes an efficient way to implement the multiplication ...
research
05/30/2011

Ethane: A Heterogeneous Parallel Search Algorithm for Heterogeneous Platforms

In this paper we present Ethane, a parallel search algorithm specificall...

Please sign up or login with your details

Forgot password? Click here to reset