Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

08/10/2023
by   Ismet Dagli, et al.
0

Two distinguishing features of state-of-the-art mobile and autonomous systems are 1) there are often multiple workloads, mainly deep neural network (DNN) inference, running concurrently and continuously; and 2) they operate on shared memory system-on-chips (SoC) that embed heterogeneous accelerators tailored for specific operations. State-of-the-art lacks efficient performance and resource management techniques necessary to either maximize total system throughput or minimize end-to-end workload latency. In this work, we propose HaX-CoNN, a novel scheme that characterizes and maps layers in concurrently executing DNN inference workloads to a diverse set of accelerators within a SoC. Our scheme uniquely takes per-layer execution characteristics, shared memory (SM) contention, and inter-accelerator transitions into account to find optimal schedules. We evaluate HaX-CoNN on NVIDIA Orin, NVIDIA Xavier, and Qualcomm Snapdragon 865 SoCs. Our experimental results indicate that HaX-CoNN minimizes memory contention by up to 45 up to 32

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2019

HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices

Recent advances in deep neural networks (DNNs) have made DNNs the backbo...
research
03/22/2022

Scale-out Systolic Arrays

Multi-pod systolic arrays are emerging as the architecture of choice in ...
research
09/21/2022

In-Network Accumulation: Extending the Role of NoC for DNN Acceleration

Network-on-Chip (NoC) plays a significant role in the performance of a D...
research
09/11/2019

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning

Software-managed heterogeneous memory (HM) provides a promising solution...
research
07/06/2023

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accur...
research
09/03/2021

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

In-Memory Acceleration (IMA) promises major efficiency improvements in d...
research
05/08/2021

Optimising Resource Management for Embedded Machine Learning

Machine learning inference is increasingly being executed locally on mob...

Please sign up or login with your details

Forgot password? Click here to reset