DeepAI AI Chat
Log In Sign Up

DiffML: End-to-end Differentiable ML Pipelines

by   Benjamin Hilprecht, et al.

In this paper, we present our vision of differentiable ML pipelines called DiffML to automate the construction of ML pipelines in an end-to-end fashion. The idea is that DiffML allows to jointly train not just the ML model itself but also the entire pipeline including data preprocessing steps, e.g., data cleaning, feature selection, etc. Our core idea is to formulate all pipeline steps in a differentiable way such that the entire pipeline can be trained using backpropagation. However, this is a non-trivial problem and opens up many new research questions. To show the feasibility of this direction, we demonstrate initial ideas and a general principle of how typical preprocessing steps such as data cleaning, feature selection and dataset selection can be formulated as differentiable programs and jointly learned with the ML model. Moreover, we discuss a research roadmap and core challenges that have to be systematically tackled to enable fully differentiable ML pipelines.


page 1

page 2

page 3

page 4


Making Classical Machine Learning Pipelines Differentiable: A Neural Translation Approach

Classical Machine Learning (ML) pipelines often comprise of multiple ML ...

DSAC - Differentiable RANSAC for Camera Localization

RANSAC is an important algorithm in robust optimization and a central bu...

Modeling Quality and Machine Learning Pipelines through Extended Feature Models

The recently increased complexity of Machine Learning (ML) methods, led ...

Amazon SageMaker Autopilot: a white box AutoML solution at scale

AutoML systems provide a black-box solution to machine learning problems...

Learning with Combinatorial Optimization Layers: a Probabilistic Approach

Combinatorial optimization (CO) layers in machine learning (ML) pipeline...

SubStrat: A Subset-Based Strategy for Faster AutoML

Automated machine learning (AutoML) frameworks have become important too...

Towards Personalized Preprocessing Pipeline Search

Feature preprocessing, which transforms raw input features into numerica...