The Relational Data Borg is Learning

by   Dan Olteanu, et al.

This paper overviews an approach that addresses machine learning over relational data as a database problem. This is justified by two observations. First, the input to the learning task is commonly the result of a feature extraction query over the relational data. Second, the learning task requires the computation of group-by aggregates. This approach has been already investigated for a number of supervised and unsupervised learning tasks, including: ridge linear regression, factorisation machines, support vector machines, decision trees, principal component analysis, and k-means; and also for linear algebra over data matrices. The main message of this work is that the runtime performance of machine learning can be dramatically boosted by a toolbox of techniques that exploit the knowledge of the underlying data. This includes theoretical development on the algebraic, combinatorial, and statistical structure of relational data processing and systems development on code specialisation, low-level computation sharing, and parallelisation. These techniques aim at lowering both the complexity and the constant factors of the learning time. This work is the outcome of extensive collaboration of the author with colleagues from RelationalAI, in particular Mahmoud Abo Khamis, Molham Aref, Hung Ngo, and XuanLong Nguyen, and from the FDB research project, in particular Ahmet Kara, Milos Nikolic, Maximilian Schleich, Amir Shaikhha, Jakub Zavodny, and Haozhe Zhang. The author would also like to thank the members of the FDB project for the figures and examples used in this paper. The author is grateful for support from industry: Amazon Web Services, Google, Infor, LogicBlox, Microsoft Azure, RelationalAI; and from the funding agencies EPSRC and ERC. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 682588.


page 1

page 2

page 3

page 4


Learning Models over Relational Data: A Brief Tutorial

This tutorial overviews the state of the art in learning models over rel...

Multi-layer Optimizations for End-to-End Data Analytics

We consider the problem of training machine learning models over multi-r...

Machine Learning over Static and Dynamic Relational Data

This tutorial overviews principles behind recent works on training and m...

Rk-means: Fast Clustering for Relational Data

Conventional machine learning algorithms cannot be applied until a data ...

F-IVM: Learning over Fast-Evolving Relational Data

F-IVM is a system for real-time analytics such as machine learning appli...

Unlocking New York City Crime Insights using Relational Database Embeddings

This version withdrawn by arXiv administrators because the author did no...

The Tensor Data Platform: Towards an AI-centric Database System

Database engines have historically absorbed many of the innovations in d...

Please sign up or login with your details

Forgot password? Click here to reset