Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads

02/24/2020
by   Florian Zaruba, et al.
0

Data-parallel applications, such as data analytics, machine learning, and scientific computing, are placing an ever-growing demand on floating-point operations per second on emerging systems. With increasing integration density, the quest for energy efficiency becomes the number one design concern. While dedicated accelerators provide high energy efficiency, they are over-specialized and hard to adjust to algorithmic changes. We propose an architectural concept that tackles the issues of achieving extreme energy efficiency while still maintaining high flexibility as a general-purpose compute engine. The key idea is to pair a tiny 10kGE control core, called Snitch, with a double-precision FPU to adjust the compute to control ratio. While traditionally minimizing non-FPU area and achieving high floating-point utilization has been a trade-off, with Snitch, we achieve them both, by enhancing the ISA with two minimally intrusive extensions: stream semantic registers (SSR) and a floating-point repetition instruction (FREP). SSRs allow the core to implicitly encode load/store instructions as register reads/writes, eliding many explicit memory instructions. The FREP extension decouples the floating-point and integer pipeline by sequencing instructions from a micro-loop buffer. These ISA extensions significantly reduce the pressure on the core and free it up for other tasks, making Snitch and FPU effectively dual-issue at a minimal incremental cost of 3.2 extensions make Snitch more flexible than a contemporary vector processor lane, achieving a 2× energy-efficiency improvement. We have evaluated the proposed core and ISA extensions on an octa-core cluster in 22nm technology. We achieve more than 5× multi-core speed-up and a 3.5× gain in energy efficiency on several parallel microkernels.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 6

page 7

page 9

page 13

research
08/14/2020

Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing

Data-parallel problems demand ever growing floating-point (FP) operation...
research
11/19/2019

Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores

Single-issue processor cores are very energy efficient but suffer from t...
research
06/02/2019

Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI

In this paper, we present Ara, a 64-bit vector processor based on the ve...
research
09/18/2023

Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency

The ever-increasing computational and storage requirements of modern app...
research
12/19/2022

PEZY-SC3: A MIMD Many-core Processor for Energy-efficient Computing

PEZY-SC3 is a highly energy- and area-efficient processor for supercompu...
research
12/13/2021

Slowing Down for Performance and Energy: An OS-Centric Study in Network Driven Workloads

This paper studies three fundamental aspects of an OS that impact the pe...
research
11/07/2017

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

Deep Neural Networks (DNNs) have emerged as the method of choice for sol...

Please sign up or login with your details

Forgot password? Click here to reset