Reducing Memory Requirements for the IPU using Butterfly Factorizations

09/16/2023
by   S. -Kazem Shekofteh, et al.
0

High Performance Computing (HPC) benefits from different improvements during last decades, specially in terms of hardware platforms to provide more processing power while maintaining the power consumption at a reasonable level. The Intelligence Processing Unit (IPU) is a new type of massively parallel processor, designed to speedup parallel computations with huge number of processing cores and on-chip memory components connected with high-speed fabrics. IPUs mainly target machine learning applications, however, due to the architectural differences between GPUs and IPUs, especially significantly less memory capacity on an IPU, methods for reducing model size by sparsification have to be considered. Butterfly factorizations are well-known replacements for fully-connected and convolutional layers. In this paper, we examine how butterfly structures can be implemented on an IPU and study their behavior and performance compared to a GPU. Experimental results indicate that these methods can provide 98.5 the IPU implementation can benefit from 1.3x and 1.6x performance improvement for butterfly and pixelated butterfly, respectively. We also reach to 1.62x training time speedup on a real-word dataset such as CIFAR10.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2018

Exploring the Vision Processing Unit as Co-processor for Inference

The success of the exascale supercomputer is largely debated to remain d...
research
05/03/2019

When parallel speedups hit the memory wall

After Amdahl's trailblazing work, many other authors proposed analytical...
research
09/30/2021

Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

Fully Connected Neural Network (FCNN) is a class of Artificial Neural Ne...
research
06/03/2019

NodeDrop: A Condition for Reducing Network Size without Effect on Output

Determining an appropriate number of features for each layer in a neural...
research
04/18/2023

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

Tensor processing units (TPUs), specialized hardware accelerators for ma...
research
04/16/2019

swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

The flourish of deep learning frameworks and hardware platforms has been...
research
08/14/2014

Cortical Processing with Thermodynamic-RAM

AHaH computing forms a theoretical framework from which a biologically-i...

Please sign up or login with your details

Forgot password? Click here to reset