Optimizing Memory-Access Patterns for Deep Learning Accelerators

02/27/2020
by   Hongbin Zheng, et al.
0

Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.

READ FULL TEXT

page 1

page 2

page 3

research
08/09/2020

SEALing Neural Network Models in Secure Deep Learning Accelerators

Deep learning (DL) accelerators are increasingly deployed on edge device...
research
11/08/2019

The Pitfall of Evaluating Performance on Emerging AI Accelerators

In recent years, domain-specific hardware has brought significant perfor...
research
03/18/2021

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

Sparsity, which occurs in both scientific applications and Deep Learning...
research
08/22/2023

Octopus: A Heterogeneous In-network Computing Accelerator Enabling Deep Learning for network

Deep learning (DL) for network models have achieved excellent performanc...
research
02/03/2021

Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

Driven by the tremendous effort in researching novel deep learning (DL) ...
research
07/19/2021

Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs

The configurable building blocks of current FPGAs – Logic blocks (LBs), ...
research
09/03/2019

Beyond Human-Level Accuracy: Computational Challenges in Deep Learning

Deep learning (DL) research yields accuracy and product improvements fro...

Please sign up or login with your details

Forgot password? Click here to reset