Multi-user Co-inference with Batch Processing Capable Edge Server

06/03/2022
by   Wenqi Shi, et al.
0

Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload inference tasks to an edge server with GPU. The inference task is partitioned into sub-tasks for a finer granularity of offloading and scheduling, and the user energy consumption minimization problem under inference latency constraints is investigated. To deal with the coupled offloading and scheduling introduced by concurrent batch processing, we first consider an offline problem with a constant edge inference latency and the same latency constraint. It is proven that optimizing the offloading policy of each user independently and aggregating all the same sub-tasks in one batch is optimal, and thus the independent partitioning and same sub-task aggregating (IP-SSA) algorithm is inspired. Further, the optimal grouping (OG) algorithm is proposed to optimally group tasks when the latency constraints are different. Finally, when future task arrivals cannot be precisely predicted, a deep deterministic policy gradient (DDPG) agent is trained to call OG. Experiments show that IP-SSA reduces up to 94.9% user energy consumption in the offline setting, while DDPG-OG outperforms DDPG-IP-SSA by up to 8.92% in the online setting.

READ FULL TEXT
research
10/30/2017

Device-centric Energy Optimization for Edge Cloud Offloading

A wireless system is considered, where, computationally complex algorith...
research
08/03/2022

Joint Optimization of DNN Inference Delay and Energy under Accuracy Constraints for AR Applications

The high computational complexity and high energy consumption of artific...
research
06/26/2023

Cost-Effective Task Offloading Scheduling for Hybrid Mobile Edge-Quantum Computing

In this paper, we aim to address the challenge of hybrid mobile edge-qua...
research
10/04/2022

Energy Consumption of Neural Networks on NVIDIA Edge Boards: an Empirical Model

Recently, there has been a trend of shifting the execution of deep learn...
research
01/22/2020

Energy-Efficient Offloading in Delay-Constrained Massive MIMO Enabled Edge Network Using Data Partitioning

We study a wireless edge-computing system which allows multiple users to...
research
12/13/2019

Queueing Analysis of GPU-Based Inference Servers with Dynamic Batching: A Closed-Form Characterization

GPU-accelerated computing is a key technology to realize high-speed infe...
research
06/15/2022

Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading

This paper proposes Mandheling, the first system that enables highly res...

Please sign up or login with your details

Forgot password? Click here to reset