A Soft SIMD Based Energy Efficient Computing Microarchitecture

12/19/2022
by   Pengbo Yu, et al.
0

The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy efficiency by leveraging opportunities (such as intrinsic parallelism and robustness to quantization errors) exposed by algorithms. We herein address this challenge by introducing a flexible two-stages computing pipeline. The pipeline can support fine-grained operand quantization through software-supported Single Instruction Multiple Data (SIMD) operations. Moreover, it can efficiently execute sequential multiplications over SIMD sub-words thanks to zero-skipping and Canonical Signed Digit (CSD) coding. Finally, a lightweight repacking unit allows changing the bitwidth of sub-words at run-time dynamically. These features are implemented within a tight energy and area budget. Indeed, experimental results showcase that our approach greatly outperforms traditional hardware SIMD ones both in terms of area and energy requirements. In particular, our pipeline occupies up to 53.1 smaller than a hardware SIMD one supporting the same sub-word widths, while performing multiplication up to 88.8

READ FULL TEXT
research
09/19/2023

Flip: Data-Centric Edge CGRA Accelerator

Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerat...
research
09/12/2022

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference

By supporting the access of multiple memory words at the same time, Bit-...
research
07/11/2023

EnergAt: Fine-Grained Energy Attribution for Multi-Tenancy

In the post-Moore's Law era, relying solely on hardware advancements for...
research
10/14/2015

Fine-Grained Energy Modeling for the Source Code of a Mobile Application

Energy efficiency has a significant influence on user experience of batt...
research
07/15/2020

Hardware Acceleration of Monte-Carlo Sampling for Energy Efficient Robust Robot Manipulation

Algorithms based on Monte-Carlo sampling have been widely adapted in rob...
research
06/24/2016

FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 Single-Precision FPU, and a 43.7GFLOPS/W at 74.6GFLOPS/mm2 Double-Precision FPU, in 28nm UTBB FDSOI

FPMax implements four FPUs optimized for latency or throughput workloads...

Please sign up or login with your details

Forgot password? Click here to reset