DiffML: End-to-end Differentiable ML Pipelines

07/04/2022
by   Benjamin Hilprecht, et al.
0

In this paper, we present our vision of differentiable ML pipelines called DiffML to automate the construction of ML pipelines in an end-to-end fashion. The idea is that DiffML allows to jointly train not just the ML model itself but also the entire pipeline including data preprocessing steps, e.g., data cleaning, feature selection, etc. Our core idea is to formulate all pipeline steps in a differentiable way such that the entire pipeline can be trained using backpropagation. However, this is a non-trivial problem and opens up many new research questions. To show the feasibility of this direction, we demonstrate initial ideas and a general principle of how typical preprocessing steps such as data cleaning, feature selection and dataset selection can be formulated as differentiable programs and jointly learned with the ML model. Moreover, we discuss a research roadmap and core challenges that have to be systematically tackled to enable fully differentiable ML pipelines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2019

Making Classical Machine Learning Pipelines Differentiable: A Neural Translation Approach

Classical Machine Learning (ML) pipelines often comprise of multiple ML ...
research
04/17/2023

eTOP: Early Termination of Pipelines for Faster Training of AutoML Systems

Recent advancements in software and hardware technologies have enabled t...
research
11/17/2016

DSAC - Differentiable RANSAC for Camera Localization

RANSAC is an important algorithm in robust optimization and a central bu...
research
12/15/2020

Amazon SageMaker Autopilot: a white box AutoML solution at scale

AutoML systems provide a black-box solution to machine learning problems...
research
07/27/2022

Learning with Combinatorial Optimization Layers: a Probabilistic Approach

Combinatorial optimization (CO) layers in machine learning (ML) pipeline...
research
06/07/2022

SubStrat: A Subset-Based Strategy for Faster AutoML

Automated machine learning (AutoML) frameworks have become important too...
research
08/11/2021

Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems

The industrial machine learning pipeline requires iterating on model fea...

Please sign up or login with your details

Forgot password? Click here to reset