Performance tuning for deep learning on a many-core processor (master thesis)

05/04/2018
by   Philippos Papaphilippou, et al.
0

Convolutional neural networks (CNNs) are becoming very successful and popular for a variety of applications. The Loki many-core processor architecture is very promising for achieving specialised hardware performance and efficiency while being a general purpose solution. Loki combines many simple cores with increased control for the programmer. This freedom can be exploited to produce much more efficient code than in conventional multiprocessors but it also creates a very big design space for possible optimisations. In this project, I explore possible optimisations for a CNN application, their portability on different Loki-specific configurations, convolution parameters and inputs. Finally, I investigate the potential for adaptive algorithms for further performance increase.

READ FULL TEXT
research
03/30/2021

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based ...
research
12/04/2017

NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs

Deep convolutional neural networks (CNNs) obtain outstanding results in ...
research
04/16/2019

swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

The flourish of deep learning frameworks and hardware platforms has been...
research
11/23/2020

End-to-End Framework for Efficient Deep Learning Using Metasurfaces Optics

Deep learning using Convolutional Neural Networks (CNNs) has been shown ...
research
05/20/2022

ALPINE: Analog In-Memory Acceleration with Tight Processor Integration for Deep Learning

Analog in-memory computing (AIMC) cores offers significant performance a...
research
10/03/2021

Heterogeneous Dual-Core Overlay Processor for Light-Weight CNNs

Light-weight convolutional neural networks (CNNs) have small complexity ...
research
07/11/2023

MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor

As the core of artificial intelligence applications, the research of con...

Please sign up or login with your details

Forgot password? Click here to reset