Cross-Stack Workload Characterization of Deep Recommendation Systems

10/10/2020
by   Samuel Hsia, et al.
0

Deep learning based recommendation systems form the backbone of most personalized cloud services. Though the computer architecture community has recently started to take notice of deep recommendation inference, the resulting solutions have taken wildly different approaches - ranging from near memory processing to at-scale optimizations. To better design future hardware systems for deep recommendation inference, we must first systematically examine and characterize the underlying systems-level impact of design decisions across the different levels of the execution stack. In this paper, we characterize eight industry-representative deep recommendation models at three different levels of the execution stack: algorithms and software, systems platforms, and hardware microarchitectures. Through this cross-stack characterization, we first show that system deployment choices (i.e., CPUs or GPUs, batch size granularity) can give us up to 15x speedup. To better understand the bottlenecks for further optimization, we look at both software operator usage breakdown and CPU frontend and backend microarchitectural inefficiencies. Finally, we model the correlation between key algorithmic model architecture features and hardware bottlenecks, revealing the absence of a single dominant algorithmic component behind each hardware bottleneck.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 6

page 7

page 8

page 9

research
01/08/2020

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

Neural personalized recommendation is the corner-stone of a wide collect...
research
11/07/2022

DeepFlow: A Cross-Stack Pathfinding Framework for Distributed AI Systems

Over the past decade, machine learning model complexity has grown at an ...
research
08/19/2019

Across-Stack Profiling and Characterization of Machine Learning Models on GPUs

The world sees a proliferation of machine learning/deep learning (ML) mo...
research
12/04/2021

Understanding the Limits of Conventional Hardware Architectures for Deep-Learning

Deep learning and hardware for it has garnered immense academic and indu...
research
01/20/2020

Towards Digital Twins for the Description of Automotive Software Systems

We present models for automotive software that capture quantitative and ...
research
04/14/2023

SpChar: Characterizing the Sparse Puzzle via Decision Trees

Sparse matrix computation is crucial in various modern applications, inc...
research
10/25/2020

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

Personalized recommendations are one of the most widely deployed machine...

Please sign up or login with your details

Forgot password? Click here to reset