Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions

03/08/2023
by   Victor Ferrari, et al.
0

Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv: a direct-convolution algorithm based on a MLIR/LLVM code-generation toolchain that can be integrated into machine-learning compilers . This algorithm introduces: (a) Convolution Slicing Analysis (CSA) - a convolution-specific 3D cache-blocking analysis pass that focuses on tile reuse over the cache hierarchy; (b) Convolution Slicing Optimization (CSO) - a code-generation pass that uses CSA to generate a tiled direct-convolution macro-kernel; and (c) Vector-Based Packing (VBP) - an architecture-specific optimized input-tensor packing solution based on vector-register shift instructions for convolutions with unitary stride. Experiments conducted on 393 convolutions from full ONNX-MLIR machine-learning models indicate that the elimination of the Im2Col transformation and the use of fast packing routines result in a total packing time reduction, on full model inference, of 2.0x - 3.9x on Intel x86 and 3.6x - 7.2x on IBM POWER10. The speed-up over an Im2Col + BLAS method based on current BLAS implementations for end-to-end machine-learning model inference is in the range of 9 10 model inference is 12 also outperforms BLAS GEMM, when computing pointwise convolutions, in more than 83

READ FULL TEXT

page 12

page 13

research
08/25/2023

Falcon: Accelerating Homomorphically Encrypted Convolutions for Efficient Private Mobile Network Inference

Efficient networks, e.g., MobileNetV2, EfficientNet, etc, achieves state...
research
04/06/2023

Tensor Slicing and Optimization for Multicore NPUs

Although code generation for Convolution Neural Network (CNN) models has...
research
06/25/2023

Im2win: An Efficient Convolution Paradigm on GPU

Convolution is the most time-consuming operation in deep neural network ...
research
09/20/2018

High Performance Zero-Memory Overhead Direct Convolutions

The computation of convolution layers in deep neural networks typically ...
research
09/04/2022

Computing Generalized Convolutions Faster Than Brute Force

In this paper, we consider a general notion of convolution. Let D be a f...
research
02/11/2022

Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores

Convolution is one of the fundamental operations of deep neural networks...
research
05/02/2023

Computing Free Convolutions via Contour Integrals

This work proposes algorithms for computing additive and multiplicative ...

Please sign up or login with your details

Forgot password? Click here to reset