Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores

11/19/2019
by   Fabian Schuiki, et al.
0

Single-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the loads/storse necessary to feed their ALU/FPU. Each instruction spent on moving data is a cycle not spent on computation, limiting ALU/FPU utilization to 33 on reductions. We propose "Stream Semantic Registers" to boost utilization and increase energy efficiency. SSR is a lightweight, non-invasive RISC-V ISA extension which implicitly encodes memory accesses as register reads/writes, eliminating a large number of loads/stores. We implement the proposed extension in the RTL of an existing multi-core cluster and synthesize the design for a modern 22nm technology. Our extension provides a significant, 2x to 5x, architectural speedup across different kernels at a small 11 area. Sequential code runs 3x faster on a single core, and 3x fewer cores are needed in a cluster to achieve the same performance. The utilization increase to almost 100 cluster. The extension reduces instruction fetches by up to 3.5x and instruction cache power consumption by up to 5.6x. Compilers can automatically map loop nests to SSRs, making the changes transparent to the programmer.

READ FULL TEXT

page 8

page 9

page 10

page 11

page 15

research
02/24/2020

Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads

Data-parallel applications, such as data analytics, machine learning, an...
research
08/14/2020

Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing

Data-parallel problems demand ever growing floating-point (FP) operation...
research
08/09/2010

Scaling Turbo Boost to a 1000 cores

The Intel Core i7 processor code named Nehalem provides a feature named ...
research
03/16/2021

An ASIC Implementation and Evaluation of a Profiled Low-Energy Instruction Set Architecture Extension

This paper presents an extension to an existing instruction set architec...
research
01/21/2022

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Computationally intensive algorithms such as Deep Neural Networks (DNNs)...
research
02/17/2020

A Lightweight ISA Extension for AES and SM4

We describe a lightweight RISC-V ISA extension for AES and SM4 block cip...
research
09/03/2021

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

In-Memory Acceleration (IMA) promises major efficiency improvements in d...

Please sign up or login with your details

Forgot password? Click here to reset