MultPIM: Fast Stateful Multiplication for Processing-in-Memory

08/30/2021
by   Orian Leitersdorf, et al.
0

Processing-in-memory (PIM) seeks to eliminate computation/memory data transfer using devices that support both storage and logic. Stateful logic techniques such as IMPLY, MAGIC and FELIX can perform logic gates within memristive crossbar arrays with massive parallelism. Multiplication via stateful logic is an active field of research due to the wide implications. Recently, RIME has become the state-of-the-art algorithm for stateful single-row multiplication by using memristive partitions, reducing the latency of the previous state-of-the-art by 5.1x. In this paper, we begin by proposing novel partition-based computation techniques for broadcasting and shifting data. Then, we design an in-memory multiplication algorithm based on the carry-save add-shift (CSAS) technique. Finally, we develop a novel stateful full-adder that significantly improves the state-of-the-art (FELIX) design. These contributions constitute MultPIM, a multiplier that reduces state-of-the-art time complexity from quadratic to linear-log. For 32-bit numbers, MultPIM improves latency by an additional 4.2x over RIME, while even slightly reducing area overhead. Furthermore, we optimize MultPIM for full-precision matrix-vector multiplication and improve latency by 25.5x over FloatPIM matrix-vector multiplication.

READ FULL TEXT

page 1

page 3

page 4

research
06/30/2022

MatPIM: Accelerating Matrix Operations with Memristive Stateful Logic

The emerging memristive Memory Processing Unit (mMPU) overcomes the memo...
research
03/03/2021

On Fast Computation of a Circulant Matrix-Vector Product

This paper deals with circulant matrices. It is shown that a circulant m...
research
06/09/2022

PartitionPIM: Practical Memristive Partitions for Fast Processing-in-Memory

Digital memristive processing-in-memory overcomes the memory wall throug...
research
04/05/2023

FourierPIM: High-Throughput In-Memory Fast Fourier Transform and Polynomial Multiplication

The Discrete Fourier Transform (DFT) is essential for various applicatio...
research
08/28/2020

Distributed-memory ℋ-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

We introduce a data distribution scheme for ℋ-matrices and a distributed...
research
02/28/2017

Improving the Neural GPU Architecture for Algorithm Learning

Algorithm learning is a core problem in artificial intelligence with sig...
research
04/15/2022

AID: Accuracy Improvement of Analog Discharge-Based in-SRAM Multiplication Accelerator

This paper presents a novel circuit (AID) to improve the accuracy of an ...

Please sign up or login with your details

Forgot password? Click here to reset