Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines

11/07/2021
by   Michael Kuchnik, et al.
24

Input pipelines, which ingest and transform input data, are an essential part of training Machine Learning (ML) models. However, it is challenging to implement efficient input pipelines, as it requires reasoning about parallelism, asynchrony, and variability in fine-grained profiling information. Our analysis of over two million ML jobs in Google datacenters reveals that a significant fraction of model training jobs could benefit from faster input data pipelines. At the same time, our analysis indicates that most jobs do not saturate host hardware, pointing in the direction of software-based bottlenecks. Motivated by these findings, we propose Plumber, a tool for finding bottlenecks in ML input pipelines. Plumber uses an extensible and interpretable operational analysis analytical model to automatically tune parallelism, prefetching, and caching under host resource constraints. Across five representative ML pipelines, Plumber obtains speedups of up to 47x for misconfigured pipelines. By automating caching, Plumber obtains end-to-end speedups of over 50

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2021

tf.data: A Machine Learning Data Processing Framework

Training machine learning models requires feeding input data for models ...
research
02/24/2021

Dataset Lifecycle Framework and its applications in Bioinformatics

Bioinformatics pipelines depend on shared POSIX filesystems for its inpu...
research
09/15/2023

Let's Roll: Synthetic Dataset Analysis for Pedestrian Detection Across Different Shutter Types

Computer vision (CV) pipelines are typically evaluated on datasets proce...
research
08/03/2023

DaphneSched: A Scheduler for Integrated Data Analysis Pipelines

DAPHNE is a new open-source software infrastructure designed to address ...
research
10/26/2022

A case for disaggregation of ML data processing

Machine Learning (ML) computation requires feeding input data for the mo...
research
07/28/2023

FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines

Even though machine learning (ML) pipelines affect an increasing array o...
research
04/17/2023

eTOP: Early Termination of Pipelines for Faster Training of AutoML Systems

Recent advancements in software and hardware technologies have enabled t...

Please sign up or login with your details

Forgot password? Click here to reset