Accelerating CNN inference on long vector architectures via co-design

12/22/2022
by   Sonia Rani Gupta, et al.
0

CPU-based inference can be an alternative to off-chip accelerators, and vector architectures are a promising option due to their efficiency. However, the large design space of convolutional algorithms and hardware implementations makes it challenging to select the best options. This paper presents ongoing research into co-designing vector architectures for CPU-based CNN inference, focusing on the im2col+GEMM and Winograd kernels. Using the Gem5 simulator, we examine the impact of various hardware microarchitectural features on RISC-V Vector and ARM-SVE ISAs. We also study the impact of several BLIS-like algorithmic optimizations on im2col+GEMM. Our co-design study shows that longer vector lengths and larger caches can improve performance by 5x with our optimized CNN kernels, compared to a vector length of 512-bit and 1MB of L2 cache. For Winograd, we present a novel approach of inter-tile parallelization that exploits longer vector lengths and offers high memory reuse, resulting in up to 2.4x performance improvement for non-strided convolutional layers with 3x3 kernel size. Our study also shows that Winograd requires smaller cache sizes compared to im2col+GEMM.

READ FULL TEXT

page 1

page 9

research
03/08/2022

A Compilation Flow for the Generation of CNN Inference Accelerators on FPGAs

We present a compilation flow for the generation of CNN inference accele...
research
07/27/2023

SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512

The sparse matrix/vector product (SpMV) is a fundamental operation in sc...
research
02/26/2021

A Variable Vector Length SIMD Architecture for HW/SW Co-designed Processors

Hardware/Software (HW/SW) co-designed processors provide a promising sol...
research
11/09/2021

Adaptable Register File Organization for Vector Processors

Modern scientific applications are getting more diverse, and the vector ...
research
10/28/2021

SIMCNN – Exploiting Computational Similarity to Accelerate CNN Training in Hardware

Convolution neural networks (CNN) are computation intensive to train. It...
research
11/20/2016

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels a...
research
12/05/2021

Boosting Mobile CNN Inference through Semantic Memory

Human brains are known to be capable of speeding up visual recognition o...

Please sign up or login with your details

Forgot password? Click here to reset