Exploration of Systolic-Vector Architecture with Resource Scheduling for Dynamic ML Workloads

06/07/2022
by   Jung Hoon Kim, et al.
0

As artificial intelligence (AI) and machine learning (ML) technologies disrupt a wide range of industries, cloud datacenters face ever-increasing demand in inference workloads. However, conventional CPU-based servers cannot handle excessive computational requirements of deep neural network (DNN) models, while GPU-based servers suffer from huge power consumption and high operating cost. In this paper, we present a scalable systolic-vector architecture that can cope with dynamically changing DNN workloads in cloud datacenters. We first devise a lightweight DNN model description format called unified model format (UMF) that enables general model representation and fast decoding in hardware accelerator. Based on this model format, we propose a heterogeneous architecture that features a load balancer that performs a high-level workload distribution and multiple systolic-vector clusters, in which each cluster consists of a programmable scheduler, throughput-oriented systolic arrays, and function-oriented vector processors. We also propose a heterogeneity-aware scheduling algorithm that enables concurrent execution of multiple DNN workloads while maximizing heterogeneous hardware utilization based on computation and memory access time estimation. Finally, we build an architecture simulation framework based on actual synthesis and place-and-route implementation results and conduct design space exploration for the proposed architecture. As a result, the proposed systolic-vector architecture achieves 10.9x higher throughput performance and 30.17x higher energy efficiency than a compatible GPU on realistic ML workloads. The proposed heterogeneity-aware scheduling algorithm improves the throughput and energy efficiency by 81 20

READ FULL TEXT

page 1

page 5

page 6

page 9

page 10

research
01/17/2019

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads

With widespread advances in machine learning, a number of large enterpri...
research
07/20/2022

Hydra: Hybrid Server Power Model

With the growing complexity of big data workloads that require abundant ...
research
09/01/2021

Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning

As machine learning techniques are applied to a widening range of applic...
research
01/29/2019

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

Memristor crossbars are circuits capable of performing analog matrix-vec...
research
11/28/2021

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU

With the fast development of deep neural networks (DNNs), many real-worl...
research
07/06/2023

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accur...
research
09/19/2022

New Trends in Photonic Switching and Optical Network Architecture for Data Centre and Computing Systems

AI/ML for data centres and data centres for AI/ML are defining new trend...

Please sign up or login with your details

Forgot password? Click here to reset