Multi-layer Optimizations for End-to-End Data Analytics

01/10/2020
by   Amir Shaikhha, et al.
0

We consider the problem of training machine learning models over multi-relational data. The mainstream approach is to first construct the training dataset using a feature extraction query over input database and then use a statistical software package of choice to train the model. In this paper we introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language, which captures a subset of Python commonly used in Jupyter notebooks for rapid prototyping of machine learning applications. The program is subject to several layers of IFAQ optimizations, such as algebraic transformations, loop transformations, schema specialization, data layout optimizations, and finally compilation into efficient low-level C++ code specialized for the given workload and data. We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and TensorFlow by several orders of magnitude for linear regression and regression tree models over several relational datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2019

Learning Models over Relational Data: A Brief Tutorial

This tutorial overviews the state of the art in learning models over rel...
research
06/01/2020

F-IVM: Learning over Fast-Evolving Relational Data

F-IVM is a system for real-time analytics such as machine learning appli...
research
08/18/2020

The Relational Data Borg is Learning

This paper overviews an approach that addresses machine learning over re...
research
06/20/2019

A Layered Aggregate Engine for Analytics Workloads

This paper introduces LMFAO (Layered Multiple Functional Aggregate Optim...
research
10/11/2019

Rk-means: Fast Clustering for Relational Data

Conventional machine learning algorithms cannot be applied until a data ...
research
12/24/2021

Fine-Tuning Data Structures for Analytical Query Processing

We introduce a framework for automatically choosing data structures to s...
research
09/16/2015

Processing Analytical Workloads Incrementally

Analysis of large data collections using popular machine learning and st...

Please sign up or login with your details

Forgot password? Click here to reset