Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

05/12/2020
by   Ranggi Hwang, et al.
0

Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has been paid in properly accelerating this important ML algorithm. This paper first provides a detailed workload characterization on personalized recommendations and identifies two significant performance limiters: memory-intensive embedding layers and compute-intensive multi-layer perceptron (MLP) layers. We then present Centaur, a chiplet-based hybrid sparse-dense accelerator that addresses both the memory throughput challenges of embedding layers and the compute limitations of MLP layers. We implement and demonstrate our proposal on an Intel HARPv2, a package-integrated CPU+FPGA device, which shows a 1.7-17.2x performance speedup and 1.7-19.5x energy-efficiency improvement than conventional approaches.

READ FULL TEXT

page 4

page 5

page 7

page 8

page 9

page 10

research
10/25/2020

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

Personalized recommendations are one of the most widely deployed machine...
research
05/10/2022

Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Personalized recommendation models (RecSys) are one of the most popular ...
research
03/01/2021

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

As the need for edge computing grows, many modern consumer devices now c...
research
03/29/2022

Eventor: An Efficient Event-Based Monocular Multi-View Stereo Accelerator on FPGA Platform

Event cameras are bio-inspired vision sensors that asynchronously repres...
research
11/25/2020

AccSS3D: Accelerator for Spatially Sparse 3D DNNs

Semantic understanding and completion of real world scenes is a foundati...
research
10/29/2022

LearningGroup: A Real-Time Sparse Training on FPGA via Learnable Weight Grouping for Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) is a powerful technology to co...
research
10/20/2020

Sparse Tucker Tensor Decomposition on a Hybrid FPGA-CPU Platform

Recommendation systems, social network analysis, medical imaging, and da...

Please sign up or login with your details

Forgot password? Click here to reset