Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

08/10/2023

∙

Two distinguishing features of state-of-the-art mobile and autonomous systems are 1) there are often multiple workloads, mainly deep neural network (DNN) inference, running concurrently and continuously; and 2) they operate on shared memory system-on-chips (SoC) that embed heterogeneous accelerators tailored for specific operations. State-of-the-art lacks efficient performance and resource management techniques necessary to either maximize total system throughput or minimize end-to-end workload latency. In this work, we propose HaX-CoNN, a novel scheme that characterizes and maps layers in concurrently executing DNN inference workloads to a diverse set of accelerators within a SoC. Our scheme uniquely takes per-layer execution characteristics, shared memory (SM) contention, and inter-accelerator transitions into account to find optimal schedules. We evaluate HaX-CoNN on NVIDIA Orin, NVIDIA Xavier, and Qualcomm Snapdragon 865 SoCs. Our experimental results indicate that HaX-CoNN minimizes memory contention by up to 45 up to 32

READ FULL TEXT

Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices

Scale-out Systolic Arrays

In-Network Accumulation: Extending the Role of NoC for DNN Acceleration

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

Optimising Resource Management for Embedded Machine Learning

Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

Related Research

HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices

Scale-out Systolic Arrays

In-Network Accumulation: Extending the Role of NoC for DNN Acceleration

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

Optimising Resource Management for Embedded Machine Learning