XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

by   Cheng Li, et al.

There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible system to serve ML models with the target latency, throughput, cost, and energy requirements while maximizing resource utilization. Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack). Existing profiling tools are disjoint, however, and only focus on profiling within a particular level of the stack, which limits the thoroughness and usefulness of the profiling results. This paper proposes XSP — an across-stack profiling design that gives a holistic and hierarchical view of ML model execution. XSP leverages distributed tracing to aggregate and correlate profile data from different sources. XSP introduces a leveled and iterative measurement approach that accurately captures the latencies at all levels of the HW/SW stack in spite of the profiling overhead. We couple the profiling design with an automated analysis pipeline to systematically analyze 65 state-of-the-art ML models. We demonstrate that XSP provides insights which would be difficult to discern otherwise.


Across-Stack Profiling and Characterization of Machine Learning Models on GPUs

The world sees a proliferation of machine learning/deep learning (ML) mo...

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

We present CFU Playground, a full-stack open-source framework that enabl...

What Causes Exceptions in Machine Learning Applications? Mining Machine Learning-Related Stack Traces on Stack Overflow

Machine learning (ML), including deep learning, has recently gained trem...

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale

Machine Learning (ML) and Deep Learning (DL) innovations are being intro...

DeepFlow: A Cross-Stack Pathfinding Framework for Distributed AI Systems

Over the past decade, machine learning model complexity has grown at an ...

Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters

Understanding and visualizing the full-stack performance trade-offs and ...

BB-ML: Basic Block Performance Prediction using Machine Learning Techniques

Recent years have seen the adoption of Machine Learning (ML) techniques ...

Please sign up or login with your details

Forgot password? Click here to reset