Partitioning and Placement of Deep Neural Networks on Distributed Edge Devices to Maximize Inference Throughput

10/21/2022

∙

Edge inference has become more widespread, as its diverse applications range from retail to wearable technology. Clusters of networked resource-constrained edge devices are becoming common, yet no system exists to split a DNN across these clusters while maximizing the inference throughput of the system. We present an algorithm which partitions DNNs and distributes them across a set of edge devices with the goal of minimizing the bottleneck latency and therefore maximizing inference throughput. The system scales well to systems of different node memory capacities and numbers of nodes. We find that we can reduce the bottleneck latency by 10x over a random algorithm and 35 partitioning-placement algorithm. Furthermore we find empirically that for the set of representative models we tested, the algorithm produces results within 9.2

READ FULL TEXT

Partitioning and Placement of Deep Neural Networks on Distributed Edge Devices to Maximize Inference Throughput

Sign in with Google

Consider DeepAI Pro