swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

04/16/2019
by   Changxi Liu, et al.
0

The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2018

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

The Deep Learning (DL) community sees many novel topologies published ea...
research
06/18/2020

Efficient Execution of Quantized Deep Learning Models: A Compiler Approach

A growing number of applications implement predictive functions using de...
research
06/27/2012

Utilizing Static Analysis and Code Generation to Accelerate Neural Networks

As datasets continue to grow, neural network (NN) applications are becom...
research
05/04/2018

Performance tuning for deep learning on a many-core processor (master thesis)

Convolutional neural networks (CNNs) are becoming very successful and po...
research
09/09/2020

Time-Based Roofline for Deep Learning Performance Analysis

Deep learning applications are usually very compute-intensive and requir...
research
01/03/2023

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

With the rapid development of deep learning models and hardware support ...
research
09/16/2023

Reducing Memory Requirements for the IPU using Butterfly Factorizations

High Performance Computing (HPC) benefits from different improvements du...

Please sign up or login with your details

Forgot password? Click here to reset