Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures

08/08/2023
by   Mohamed Assem Ibrahim, et al.
0

This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations of FFT on modern GPUs are memory bandwidth bound. As such, the memory bandwidth boost availed by commercial PIM solutions makes a case for PIM to accelerate FFT. To this end, we first deduce a mapping of FFT computation to a strawman PIM architecture representative of recent commercial designs. We observe that even with careful data mapping, PIM is not effective in accelerating FFT. To address this, we make a case for collaborative acceleration of FFT with PIM and GPU. Further, we propose software and hardware innovations which lower PIM operations necessary for a given FFT. Overall, our optimized PIM FFT mapping, termed Pimacolaba, delivers performance and data movement savings of up to 1.38× and 2.76×, respectively, over a range of FFT sizes.

READ FULL TEXT

page 5

page 7

page 8

page 9

research
09/14/2023

Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures

Continual demand for memory bandwidth has made it worthwhile for memory ...
research
05/08/2021

PIM-DRAM: Accelerating Machine Learning Workloads using Processing in Commodity DRAM

Deep Neural Networks (DNNs) have transformed the field of machine learni...
research
01/12/2023

Neural Shadow Mapping

We present a neural extension of basic shadow mapping for fast, high qua...
research
05/06/2023

ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration

Processing-in-memory (PIM) architectures are emerging to reduce data mov...
research
12/03/2020

Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs

Homomorphic encryption (HE) draws huge attention as it provides a way of...
research
11/15/2019

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

To satisfy the compute and memory demands of deep neural networks, neura...
research
11/30/2020

Accelerating Bandwidth-Bound Deep Learning Inference with Main-Memory Accelerators

DL inference queries play an important role in diverse internet services...

Please sign up or login with your details

Forgot password? Click here to reset