Voltage Scaling for Partitioned Systolic Array in A Reconfigurable Platform

by   Rourab paul, et al.

The exponential emergence of Field Programmable Gate Array (FPGA) has accelerated the research of hardware implementation of Deep Neural Network (DNN). Among all DNN processors, domain specific architectures, such as, Google's Tensor Processor Unit (TPU) have outperformed conventional GPUs. However, implementation of TPUs in reconfigurable hardware should emphasize energy savings to serve the green computing requirement. Voltage scaling, a popular approach towards energy savings, can be a bit critical in FPGA as it may cause timing failure if not done in an appropriate way. In this work, we present an ultra low power FPGA implementation of a TPU for edge applications. We divide the systolic-array of a TPU into different FPGA partitions, where each partition uses different near threshold (NTC) biasing voltages to run its FPGA cores. The biasing voltage for each partition is roughly calculated by the proposed offline schemes. However, further calibration of biasing voltage is done by the proposed online scheme. Four clustering algorithms based on the slack value of different design paths study the partitioning of FPGA. To overcome the timing failure caused by NTC, the higher slack paths are placed in lower voltage partitions and lower slack paths are placed in higher voltage partitions. The proposed architecture is simulated in Artix-7 FPGA using the Vivado design suite and Python tool. The simulation results substantiate the implementation of voltage scaled TPU in FPGAs and also justifies its power efficiency.


Near Threshold Computation of Partitioned Ring Learning With Error (RLWE) Post Quantum Cryptography on Reconfigurable Architecture

Ring Learning With Error (RLWE) algorithm is used in Post Quantum Crypto...

Reconfigurable Hardware Implementation of the Successive Overrelaxation Method

In this chapter, we study the feasibility of implementing SOR in reconfi...

Amorphous Dynamic Partial Reconfiguration with Flexible Boundaries to Remove Fragmentation

Dynamic partial reconfiguration (DPR) allows one region of an field-prog...

Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices

We present a novel framework for designing multiplierless kernel machine...

Scaling Binarized Neural Networks on Reconfigurable Logic

Binarized neural networks (BNNs) are gaining interest in the deep learni...

Power and Execution Time Measurement Methodology for SDF Applications on FPGA-based MPSoCs

Timing and power consumption play an important role in the design of emb...

FPGA-based Mining of Lyra2REv2 Cryptocurrencies

Lyra2REv2 is a hashing algorithm that consists of a chain of individual ...