Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing

08/14/2020
by   Florian Zaruba, et al.
0

Data-parallel problems demand ever growing floating-point (FP) operations per second under tight area- and energy-efficiency constraints. In this work, we present Manticore, a general-purpose, ultra-efficient chiplet-based architecture for data-parallel FP workloads. We have manufactured a prototype of the chiplet's computational core in Globalfoundries 22FDX process and demonstrate more than 5x improvement in energy efficiency on FP intensive workloads compared to CPUs and GPUs. The compute capability at high energy and area efficiency is provided by Snitch clusters containing eight small integer cores, each controlling a large FPU. The core supports two custom ISA extensions: The SSR extension elides explicit load and store instructions by encoding them as register reads and writes. The FREP extension decouples the integer core from the FPU allowing floating-point instructions to be issued independently. These two extensions allow the single-issue core to minimize its instruction fetch bandwidth and saturate the instruction bandwidth of the FPU, achieving FPU utilization above 90 to the FPU.

READ FULL TEXT

page 1

page 2

research
02/24/2020

Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads

Data-parallel applications, such as data analytics, machine learning, an...
research
12/01/2018

NTX: An Energy-efficient Streaming Accelerator for Floating-point Generalized Reduction Workloads in 22nm FD-SOI

Specialized coprocessors for Multiply-Accumulate (MAC) intensive workloa...
research
11/19/2019

Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores

Single-issue processor cores are very energy efficient but suffer from t...
research
10/15/2020

FPRaker: A Processing Element For Accelerating Neural Network Training

We present FPRaker, a processing element for composing training accelera...
research
02/24/2017

An analysis of core- and chip-level architectural features in four generations of Intel server processors

This paper presents a survey of architectural features among four genera...
research
01/28/2018

BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing

The past decades witness FLOPS (Floating-point Operations per Second), a...
research
11/17/2017

Decanting the Contribution of Instruction Types and Loop Structures in the Reuse of Traces

Reuse has been proposed as a microarchitecture-level mechanism to reduce...

Please sign up or login with your details

Forgot password? Click here to reset