Heterogeneous Dual-Core Overlay Processor for Light-Weight CNNs

10/03/2021
by   Tiandong Zhao, et al.
0

Light-weight convolutional neural networks (CNNs) have small complexity and are good candidates for low-power, high-throughput inference. Such networks are heterogeneous in terms of computation-to-communication (CTC) ratios and computation patterns between layers, especially for different layer types. Yet, existing AI processors either use homogeneous processing elements (PEs), resulting in low runtime PE efficiency, or run different layers on heterogeneous PEs in sequential, introducing resource redundancy. This paper proposes a heterogeneous dual-core architecture (dual-OPU), where one core is optimized for regular convolution layers and the other for depthwise convolution layers. PEs are homogeneous with each core. To make full use of dual-core parallelism, we develop a scheduling algorithm to concurrently execute layers for different input images on dual-core and balance parallel workload. Meanwhile, we automatically tune the PE number for a core and tune the input size for each PE to maximize throughput. Compared with a single-core processor with the same area for a single network, heterogeneous dual-OPU on average improves runtime PE efficiency and throughput by 11 respectively. For a workload of multiple networks, dual-OPU improves average throughput by 11 same area. To the best of our knowledge, it is the first in-depth study on the heterogeneous dual-core processor for light-weight CNNs.

READ FULL TEXT
research
03/28/2018

Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC

Convolutional Neural Networks (CNN) have been widely deployed in diverse...
research
03/14/2019

High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors

IoT Edge intelligence requires Convolutional Neural Network (CNN) infere...
research
11/09/2022

LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting

This paper proposes a hardware-efficient architecture, Linearized Convol...
research
06/30/2016

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

Convolutional neural networks (CNNs) are revolutionizing a variety of ma...
research
12/04/2013

High Throughput Virtual Screening with Data Level Parallelism in Multi-core Processors

Improving the throughput of molecular docking, a computationally intensi...
research
05/04/2018

Performance tuning for deep learning on a many-core processor (master thesis)

Convolutional neural networks (CNNs) are becoming very successful and po...
research
10/19/2019

Fast and Light-weight Portrait Segmentation

Improving the efficiency of portrait segmentation is of great importance...

Please sign up or login with your details

Forgot password? Click here to reset