PEZY-SC3: A MIMD Many-core Processor for Energy-efficient Computing

12/19/2022
by   Naoya Hatta, et al.
0

PEZY-SC3 is a highly energy- and area-efficient processor for supercomputers developed using TSMC 7nm process technology. It is the third generation of the PEZY-SCx series developed by PEZY Computing, K.K. Supercomputers equipped with the PEZY-SCx series have been deployed at several research centers and are used for large scale scientific calculations. PEZY-SC3 outperforms previous PEZY-SCx and other processors in terms of energy and area efficiency. To achieve high efficiency, PEZY-SC3 employs a MIMD many-core, fine-grained multithreading, and non-coherent cache, focusing on applications involving high thread-level parallelism. Our MIMD many-core-based architecture achieves high efficiency while providing higher programmability than existing architectures based on specialized tensor units with limited functionality or wide-SIMD. Another key point of this architecture is to achieve both high efficiency and high throughput without using complex and expensive units such as out-of-order schedulers. Moreover, our novel non-coherent and hierarchical cache system enables high scalability on many-core without compromising programmability. The energy efficiency of a system equipped with PEZY-SC3 is approximately 24.6 GFlops/W, and it ranked 12th in the Green500 (November 2021), which measures the energy efficiency of supercomputers. In terms of processor architecture, all the systems ranked higher than the PEZY-SC3 system are equipped with NVIDIA A100 or Preferred Networks NM-Core, and thus PEZY-SC3 is the third-ranked processor after them. While A100 and NM-Core achieve high energy efficiency with tensor units specialized for specific functions, PEZY-SC3 does not have such specialized tensor units and thus has higher programmability.

READ FULL TEXT
research
12/03/2022

THOR – A Neuromorphic Processor with 7.29G TSOP^2/mm^2Js Energy-Throughput Efficiency

Neuromorphic computing using biologically inspired Spiking Neural Networ...
research
06/16/2016

A 0.3-2.6 TOPS/W Precision-Scalable Processor for Real-Time Large-Scale ConvNets

A low-power precision-scalable processor for ConvNets or convolutional n...
research
02/24/2020

Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads

Data-parallel applications, such as data analytics, machine learning, an...
research
03/26/2023

A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics

This paper proposes a special-purpose system to achieve high-accuracy an...
research
11/12/2019

Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints

An effective way to improve energy efficiency is to throttle hardware re...
research
04/29/2021

Automated Design Space Exploration of CGRA Processing Element Architectures using Frequent Subgraph Analysis

The architecture of a coarse-grained reconfigurable array (CGRA) process...
research
06/06/2016

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution

We introduce the Coarse-Grain Out-of-Order (CG- OoO) general purpose pro...

Please sign up or login with your details

Forgot password? Click here to reset