DeepAI
Log In Sign Up

Tensor Relational Algebra for Machine Learning System Design

09/01/2020
by   Binhang Yuan, et al.
0

Machine learning (ML) systems have to support various tensor operations. However, such ML systems were largely developed without asking: what are the foundational abstractions necessary for building machine learning systems? We believe that proper computational and implementation abstractions will allow for the construction of self-configuring, declarative ML systems, especially when the goal is to execute tensor operations in a distributed environment, or partitioned across multiple AI accelerators (ASICs). To this end, we first introduce a tensor relational algebra (TRA), which is expressive to encode any tensor operation that can be written in the Einstein notation. We consider how TRA expressions can be re-written into an implementation algebra (IA) that enables effective implementation in a distributed environment, as well as how expressions in the IA can be optimized. Our empirical study shows that the optimized implementation provided by IA can reach or even out-perform carefully engineered HPC or ML systems for large scale tensor manipulations and ML workflows in distributed clusters.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/28/2022

SpDISTAL: Compiling Distributed Sparse Tensor Computations

We introduce SpDISTAL, a compiler for sparse tensor algebra that targets...
01/02/2018

On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML

Many large-scale machine learning (ML) systems allow specifying custom M...
03/08/2022

TTML: tensor trains for general supervised machine learning

This work proposes a novel general-purpose estimator for supervised mach...
10/09/2020

A Tensor Compiler for Unified Machine Learning Prediction Serving

Machine Learning (ML) adoption in the enterprise requires simpler and mo...
12/03/2015

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

MXNet is a multi-language machine learning (ML) library to ease the deve...
10/29/2014

High-Performance Distributed ML at Scale through Parameter Server Consistency Models

As Machine Learning (ML) applications increase in data size and model co...