Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI

06/02/2019
by   Matheus Cavalcante, et al.
0

In this paper, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's vector extension, implemented in GlobalFoundries 22FDX FD-SOI technology. Ara's microarchitecture is scalable, as it is composed of a set of identical lanes, each containing part of the processor's vector register file and functional units. It achieves up to 97 256 x 256 double precision matrix multiplication on sixteen lanes. Ara runs at 1.2 GHz in the typical corner (TT/0.80 V/25 oC), achieving a performance up to 34 DP-GFLOPS. In terms of energy efficiency, Ara achieves up to 67 DP-GFLOPS/W under the same conditions, which is 56 found in literature. An analysis on several vectorizable linear algebra computation kernels for a range of different matrix and vector sizes gives insight into performance limitations and bottlenecks for vector processors and outlines directions to maintain high energy efficiency even for small matrix sizes where the vector architecture achieves suboptimal utilization of the available FPUs.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 4

page 6

page 8

page 9

page 10

page 12

02/24/2020

Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads

Data-parallel applications, such as data analytics, machine learning, an...
11/17/2020

Audience Creation for Consumables – Simple and Scalable Precision Merchandising for a Growing Marketplace

Consumable categories, such as grocery and fast-moving consumer goods, a...
06/24/2016

FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 Single-Precision FPU, and a 43.7GFLOPS/W at 74.6GFLOPS/mm2 Double-Precision FPU, in 28nm UTBB FDSOI

FPMax implements four FPUs optimized for latency or throughput workloads...
06/16/2016

A 0.3-2.6 TOPS/W Precision-Scalable Processor for Real-Time Large-Scale ConvNets

A low-power precision-scalable processor for ConvNets or convolutional n...
11/09/2021

Adaptable Register File Organization for Vector Processors

Modern scientific applications are getting more diverse, and the vector ...
08/07/2019

3D-aCortex: An Ultra-Compact Energy-Efficient Neurocomputing Platform Based on Commercial 3D-NAND Flash Memories

The first contribution of this paper is the development of extremely den...
06/10/2018

Characteristic Analysis of 1024-Point Quantized Radix-2 FFT/IFFT Processor

The precise analysis and accurate measurement of harmonic provides a rel...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.