cltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network Library, Based on OpenCL

06/15/2016
by   Hugh Perkins, et al.
0

This paper presents cltorch, a hardware-agnostic backend for the Torch neural network framework. cltorch enables training of deep neural networks on GPUs from diverse hardware vendors, including AMD, NVIDIA, and Intel. cltorch contains sufficient implementation to run models such as AlexNet, VGG, Overfeat, and GoogleNet. It is written using the OpenCL language, a portable compute language, governed by the Khronos Group. cltorch is the top-ranked hardware-agnostic machine learning framework on Chintala's convnet-benchmarks page. This paper presents the technical challenges encountered whilst creating the cltorch backend for Torch, and looks in detail at the challenges related to obtaining a fast hardware-agnostic implementation. The convolutional layers are identified as the key area of focus for accelerating hardware-agnostic frameworks. Possible approaches to accelerating the convolutional implementation are identified including: implementation of the convolutions using the implicitgemm or winograd algorithm, using a GEMM implementation adapted to the geometries associated with the convolutional algorithm, or using a pluggable hardware-specific convolutional implementation.

READ FULL TEXT

page 3

page 4

research
04/08/2019

Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN

Over the past few years machine learning has seen a renewed explosion of...
research
06/20/2022

Deep Learning Models on CPUs: A Methodology for Efficient Training

GPUs have been favored for training deep learning models due to their hi...
research
12/24/2014

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

We examine the performance profile of Convolutional Neural Network train...
research
09/03/2021

Impact of GPU uncertainty on the training of predictive deep neural networks

[retracted] We found out that the difference was dependent on the Chaine...
research
12/11/2019

Array Languages Make Neural Networks Fast

Modern machine learning frameworks are complex: they are typically organ...
research
06/01/2017

CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks

Accelerating the inference of a trained DNN is a well studied subject. I...

Please sign up or login with your details

Forgot password? Click here to reset