Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

02/01/2023
by   Zhang Runhua, et al.
0

Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance. In this paper, we propose Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation proves the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 21.2%–84.9% and 17.9%–96.2% , respectively. Besides, Xenos also outperforms the widely-used TVM by 3.22×–17.92×. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68x–3.78x compared with the single device.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2022

Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing

For time-critical IoT applications using deep learning, inference accele...
research
07/22/2022

Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing

This paper studies inference acceleration using distributed convolutiona...
research
03/09/2023

Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version

Quantization is a popular technique used in Deep Neural Networks (DNN) i...
research
11/02/2020

IOS: Inter-Operator Scheduler for CNN Acceleration

To accelerate CNN inference, existing deep learning frameworks focus on ...
research
12/04/2020

SensiX: A Platform for Collaborative Machine Learning on the Edge

The emergence of multiple sensory devices on or near a human body is unc...
research
04/27/2022

Edge-PRUNE: Flexible Distributed Deep Learning Inference

Collaborative deep learning inference between low-resource endpoint devi...
research
07/22/2022

Receptive Field-based Segmentation for Distributed CNN Inference Acceleration in Collaborative Edge Computing

This paper studies inference acceleration using distributed convolutiona...

Please sign up or login with your details

Forgot password? Click here to reset