SQLFlow: A Bridge between SQL and Machine Learning

01/19/2020
by   Yi Wang, et al.
0

Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques – supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.

READ FULL TEXT
research
07/29/2019

sql4ml A declarative end-to-end workflow for machine learning

We present sql4ml, a system for expressing supervised machine learning (...
research
11/14/2020

Synthesis of SQL Queries from South African Local Language Narrations

English remains the language of choice for database courses and widely u...
research
02/21/2020

Snel: SQL Native Execution for LLVM

Snel is a relational database engine featuring Just-In-Time (JIT) compil...
research
04/11/2020

In-Machine-Learning Database: Reimagining Deep Learning with Old-School SQL

In-database machine learning has been very popular, almost being a clich...
research
04/07/2021

Efficient and Accurate In-Database Machine Learning with SQL Code Generation in Python

Following an analysis of the advantages of SQL-based Machine Learning (M...
research
07/01/2023

JoinBoost: Grow Trees Over Normalized Data Using Only SQL

Although dominant for tabular data, ML libraries that train tree models ...
research
11/01/2019

Extending Relational Query Processing with ML Inference

The broadening adoption of machine learning in the enterprise is increas...

Please sign up or login with your details

Forgot password? Click here to reset