The Ultimate DataFlow for Ultimate SuperComputers-on-a-Chips

09/20/2020
by   Veljko Milutinovic, et al.
0

This article starts from the assumption that near future 100BTransistor SuperComputers-on-a-Chip will include N big multi-core processors, 1000N small many-core processors, a TPU-like fixed-structure systolic array accelerator for the most frequently used Machine Learning algorithms needed in bandwidth-bound applications and a flexible-structure reprogrammable accelerator for less frequently used Machine Learning algorithms needed in latency-critical applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2023

Matrix Multiplication Using Only Addition

Matrix multiplication consumes a large fraction of the time taken in man...
research
08/02/2021

Accelerating Markov Random Field Inference with Uncertainty Quantification

Statistical machine learning has widespread application in various domai...
research
10/08/2018

Effective Parallelisation for Machine Learning

We present a novel parallelisation scheme that simplifies the adaptation...
research
11/10/2016

In-Storage Embedded Accelerator for Sparse Pattern Processing

We present a novel architecture for sparse pattern processing, using fla...
research
08/09/2023

CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

Fully homomorphic encryption (FHE) is in the spotlight as a definitive s...
research
01/19/2018

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective

Machine learning is playing an increasingly significant role in emerging...
research
12/31/2020

Data Criticality in Multi-Threaded Applications: An Insight for Many-Core Systems

Multi-threaded applications are capable of exploiting the full potential...

Please sign up or login with your details

Forgot password? Click here to reset