HeAT – a Distributed and GPU-accelerated Tensor Framework for Data Analytics

07/27/2020
by   Markus Götz, et al.
0

To cope with the rapid growth in available data, the efficiency of data analysis and machine learning libraries has recently received increased attention. Although great advancements have been made in traditional array-based computations, most are limited by the resources available on a single computation node. Consequently, novel approaches must be made to exploit distributed resources, e.g. distributed memory architectures. To this end, we introduce HeAT, an array-based numerical programming framework for large-scale parallel processing with an easy-to-use NumPy-like API. HeAT utilizes PyTorch as a node-local eager execution engine and distributes the workload on arbitrarily large high-performance computing systems via MPI. It provides both low-level array computations, as well as assorted higher-level algorithms. With HeAT, it is possible for a NumPy user to take full advantage of their available resources, significantly lowering the barrier to distributed data analysis. When compared to similar frameworks, HeAT achieves speedups of up to two orders of magnitude.

READ FULL TEXT
research
10/24/2016

Large Scale Parallel Computations in R through Elemental

Even though in recent years the scale of statistical analysis problems h...
research
04/20/2021

ds-array: A Distributed Data Structure for Large Scale Machine Learning

Machine learning has proved to be a useful tool for extracting knowledge...
research
10/17/2018

Asynchronous Execution of Python Code on Task Based Runtime Systems

Despite advancements in the areas of parallel and distributed computing,...
research
11/28/2022

High-performance xPU Stencil Computations in Julia

We present an efficient approach for writing architecture-agnostic paral...
research
06/03/2018

Alchemist: An Apache Spark <=> MPI Interface

The Apache Spark framework for distributed computation is popular in the...
research
04/12/2017

Parallelized Kendall's Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

Pairwise association measure is an important operation in data analytics...
research
05/30/2018

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Apache Spark is a popular system aimed at the analysis of large data set...

Please sign up or login with your details

Forgot password? Click here to reset