Communication-Optimal Parallel Standard and Karatsuba Integer Multiplication in the Distributed Memory Model

09/30/2020
by   Lorenzo De Stefani, et al.
0

We present COPSIM a parallel implementation of standard integer multiplication for the distributed memory setting, and COPK a parallel implementation of Karatsuba's fast integer multiplication algorithm for a distributed memory setting. When using 𝒫 processors, each equipped with a local non-shared memory, to compute the product of tho n-digits integer numbers, under mild conditions, our algorithms achieve optimal speedup of the computational time. That is, 𝒪(n^2/𝒫) for COPSIM, and 𝒪(n^log_2 3/𝒫) for COPK. The total amount of memory required across the processors is 𝒪(n), that is, within a constant factor of the minimum space required to store the input values. We rigorously analyze the Input/Output (I/O) cost of the proposed algorithms. We show that their bandwidth cost (i.e., the number of memory words sent or received by at least one processors) matches asymptotically corresponding known I/O lower bounds, and their latency (i.e., the number of messages sent or received in the algorithm's critical execution path) is asymptotically within a multiplicative factor 𝒪(log^2_2 𝒫) of the corresponding known I/O lower bounds. Hence, our algorithms are asymptotically optimal with respect to the bandwidth cost and almost asymptotically optimal with respect to the latency cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2019

On the I/O complexity of hybrid algorithms for Integer Multiplication

Almost asymptotically tight lower bounds are derived for the I/O complex...
research
02/18/2020

Connecting MapReduce Computations to Realistic Machine Models

We explain how the popular, highly abstract MapReduce model of parallel ...
research
09/29/2020

Communication Lower-Bounds for Distributed-Memory Computations for Mass Spectrometry based Omics Data

Mass spectrometry based omics data analysis require significant time and...
research
05/10/2022

The spatial computer: A model for energy-efficient parallel computation

We present a new parallel model of computation suitable for spatial arch...
research
05/14/2018

A 3D Parallel Algorithm for QR Decomposition

Interprocessor communication often dominates the runtime of large matrix...
research
02/05/2019

Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries

On common processors, integer multiplication is many times faster than i...
research
09/03/2022

Ridgeline: A 2D Roofline Model for Distributed Systems

In this short paper, we introduce the Ridgeline model, an extension of t...

Please sign up or login with your details

Forgot password? Click here to reset