Distributing Sparse Matrix/Graph Applications in Heterogeneous Clusters – an Experimental Study
Many problems in scientific and engineering applications contain sparse matrices or graphs as main input objects, e.g. numerical simulations on meshes. Large inputs are abundant these days and require parallel processing for memory size and speed. To optimize the execution of such simulations on cluster systems, the input problem needs to be distributed suitably onto the processing units (PUs). More and more frequently, such clusters contain different CPUs or a combination of CPUs and GPUs. This heterogeneity makes the load distribution problem quite challenging. Our study is motivated by the observation that established partitioning tools do not handle such heterogeneous distribution problems as well as homogeneous ones. In this paper, we first formulate the problem of balanced load distribution for heterogeneous architectures as a multi-objective, single-constraint optimization problem. We then split the problem into two phases and propose a greedy approach to determine optimal block sizes for each PU. These block sizes are then fed into numerous existing graph partitioners, for us to examine how well they handle the above problem. One of the tools we consider is an extension of our own previous work (von Looz et al, ICPP'18) called Geographer. Our experiments on well-known benchmark meshes indicate that only two tools under consideration are able to yield good quality. These two are Parmetis (both the geometric and the combinatorial variant) and Geographer. While Parmetis is faster, Geographer yields better quality on average.
READ FULL TEXT