First-Generation Inference Accelerator Deployment at Facebook

07/08/2021
∙
by   Michael Anderson, et al.
∙
0
∙

In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2/Glow. A characteristic of this ecosystem from the start is its openness to enable a variety of AI accelerators from different vendors. This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. We describe various performance optimizations, at both platform and accelerator level, which enables this platform to serve production traffic at Facebook. We also share deployment challenges, lessons learned during performance optimization, as well as provide guidance for future inference hardware co-design.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 03/20/2020

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

Large-scale training is important to ensure high performance and accurac...
research
∙ 03/19/2021

Performance Analysis of Deep Learning Workloads on a Composable System

A composable infrastructure is defined as resources, such as compute, st...
research
∙ 10/07/2019

Impact of Inference Accelerators on hardware selection

As opportunities for AI-assisted healthcare grow steadily, model deploym...
research
∙ 05/26/2021

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Tremendous success of machine learning (ML) and the unabated growth in M...
research
∙ 11/24/2018

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

The application of deep learning techniques resulted in remarkable impro...
research
∙ 06/24/2021

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

Implementing embedded neural network processing at the edge requires eff...
research
∙ 05/08/2023

Cheshire: A Lightweight, Linux-Capable RISC-V Host Platform for Domain-Specific Accelerator Plug-In

Power and cost constraints in the internet-of-things (IoT) extreme-edge ...

Please sign up or login with your details

Forgot password? Click here to reset