FPRaker: A Processing Element For Accelerating Neural Network Training

10/15/2020
by   Omar Mohamed Awad, et al.
0

We present FPRaker, a processing element for composing training accelerators. FPRaker processes several floating-point multiply-accumulation operations concurrently and accumulates their result into a higher precision accumulator. FPRaker boosts performance and energy efficiency during training by taking advantage of the values that naturally appear during training. Specifically, it processes the significand of the operands of each multiply-accumulate as a series of signed powers of two. The conversion to this form is done on-the-fly. This exposes ineffectual work that can be skipped: values when encoded have few terms and some of them can be discarded as they would fall outside the range of the accumulator given the limited precision of floating-point. We demonstrate that FPRaker can be used to compose an accelerator for training and that it can improve performance and energy efficiency compared to using conventional floating-point units under ISO-compute area constraints. We also demonstrate that FPRaker delivers additional benefits when training incorporates pruning and quantization. Finally, we show that FPRaker naturally amplifies performance with training methods that use a different precision per layer.

READ FULL TEXT

page 1

page 4

page 5

page 10

page 11

page 12

research
03/02/2020

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

The excellent performance of modern deep neural networks (DNNs) comes at...
research
03/13/2022

FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support

Training deep neural networks (DNNs) is a computationally expensive job,...
research
10/02/2016

Accelerating Deep Convolutional Networks using low-precision and sparsity

We explore techniques to significantly improve the compute efficiency an...
research
09/01/2020

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

TensorDash is a hardware level technique for enabling data-parallel MAC ...
research
07/26/2021

Dissecting FLOPs along input dimensions for GreenAI cost estimations

The term GreenAI refers to a novel approach to Deep Learning, that is mo...
research
08/14/2020

Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing

Data-parallel problems demand ever growing floating-point (FP) operation...
research
09/18/2023

Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency

The ever-increasing computational and storage requirements of modern app...

Please sign up or login with your details

Forgot password? Click here to reset