Analytical Characterization and Design Space Exploration for Optimization of CNNs

01/24/2021
by   Rui Li, et al.
0

Moving data through the memory hierarchy is a fundamental bottleneck that can limit the performance of core algorithms of machine learning, such as convolutional neural networks (CNNs). Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement. However, the search space for finding the best loop-level optimization configuration is explosively large. This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs. Experimental evaluation shows that this approach achieves comparable or better performance than state-of-the-art libraries and auto-tuning based optimizers for CNNs.

READ FULL TEXT
research
02/02/2019

Towards an Achievable Performance for the Loop Nests

Numerous code optimization techniques, including loop nest optimizations...
research
10/13/2020

Autotuning Search Space for Loop Transformations

One of the challenges for optimizing compilers is to predict whether app...
research
09/25/2021

NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs

Convolutional Neural Networks (CNNs), one of the most representative alg...
research
06/14/2016

A Systematic Approach to Blocking Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are the state of the art solution f...
research
12/31/2020

I/O Lower Bounds for Auto-tuning of Convolutions in CNNs

Convolution is the most time-consuming part in the computation of convol...
research
11/14/2018

A Performance Vocabulary for Affine Loop Transformations

Modern polyhedral compilers excel at aggressively optimizing codes with ...
research
09/03/2019

De(con)struction of the lazy-F loop: improving performance of Smith Waterman alignment

Striped variation of the Smith-Waterman algorithm is known as extremely ...

Please sign up or login with your details

Forgot password? Click here to reset