Accelerating Recommender Systems via Hardware "scale-in"

09/11/2020
by   Suresh Krishna, et al.
0

In today's era of "scale-out", this paper makes the case that a specialized hardware architecture based on "scale-in"–placing as many specialized processors as possible along with their memory systems and interconnect links within one or two boards in a rack–would offer the potential to boost large recommender system throughput by 12-62x for inference and 12-45x for training compared to the DGX-2 state-of-the-art AI platform, while minimizing the performance impact of distributing large models across multiple processors. By analyzing Facebook's representative model–Deep Learning Recommendation Model (DLRM)–from a hardware architecture perspective, we quantify the impact on throughput of hardware parameters such as memory system design, collective communications latency and bandwidth, and interconnect topology. By focusing on conditions that stress hardware, our analysis reveals limitations of existing AI accelerators and hardware platforms.

READ FULL TEXT

page 5

page 10

page 12

research
03/20/2020

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

Large-scale training is important to ensure high performance and accurac...
research
05/18/2021

TRIM: A Design Space Exploration Model for Deep Neural Networks Inference and Training Accelerators

There is increasing demand for specialized hardware for training deep ne...
research
04/11/2022

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

The development of personalized recommendation has significantly improve...
research
02/21/2023

MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

Deep learning recommendation systems serve personalized content under di...
research
05/18/2021

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Deep learning recommendation systems must provide high quality, personal...
research
09/03/2022

HammingMesh: A Network Topology for Large-Scale Deep Learning

Numerous microarchitectural optimizations unlocked tremendous processing...
research
05/10/2020

Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures

During the last two years, the goal of many researchers has been to sque...

Please sign up or login with your details

Forgot password? Click here to reset