Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds

by   Grace Dinh, et al.

Reducing communication - either between levels of a memory hierarchy or between processors over a network - is a key component of performance optimization (in both time and energy) for many problems, including dense linear algebra, particle interactions, and machine learning. For these problems, which can be represented as nested-loop computations, previous tiling based approaches have been used to find both lower bounds on the communication required to execute them and optimal rearrangements, or blockings, to attain such lower bounds. However, such general approaches have typically assumed the problem sizes are large, an assumption that is often not met in practice. For instance, the classical (# arithmetic operations)/(cache size)^1/2 lower bound for matrix multiplication is not tight for matrix-vector multiplications, which must read in at least O(# arithmetic operations) words of memory; similar issues occur for almost all convolutions in machine learning applications, which use extremely small filter sizes (and therefore, loop bounds). In this paper, we provide an efficient way to both find and obtain, via an appropriate, efficiently constructible blocking, communication lower bounds and matching tilings which attain these lower bounds for nested loop programs with arbitrary loop bounds that operate on multidimensional arrays in the projective case, where the array indices are subsets of the loop indices. Our approach works on all such problems, regardless of dimensionality, size, memory access patterns, or number of arrays, and directly applies to (among other examples) matrix multiplication and similar dense linear algebra operations, tensor contractions, n-body pairwise interactions, pointwise convolutions, and fully connected layers.


page 1

page 2

page 3

page 4


Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds

Communication lower bounds have long been established for matrix multipl...

Communication Lower Bounds for Nested Bilinear Algorithms

We develop lower bounds on communication in the memory hierarchy or betw...

Communication-Optimal Convolutional Neural Nets

Efficiently executing convolutional neural nets (CNNs) is important in m...

Communication Bounds for Convolutional Neural Networks

Convolutional neural networks (CNNs) are important in a wide variety of ...

Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs

For most relevant computation, the energy and time needed for data movem...

On the I/O complexity of hybrid algorithms for Integer Multiplication

Almost asymptotically tight lower bounds are derived for the I/O complex...

Scheduling optimization of parallel linear algebra algorithms using Supervised Learning

Linear algebra algorithms are used widely in a variety of domains, e.g m...

Please sign up or login with your details

Forgot password? Click here to reset