Dissecting the Graphcore IPU Architecture via Microbenchmarking

12/07/2019
by   Zhe Jia, et al.
0

This report focuses on the architecture and performance of the Intelligence Processing Unit (IPU), a novel, massively parallel platform recently introduced by Graphcore and aimed at Artificial Intelligence/Machine Learning (AI/ML) workloads. We dissect the IPU's performance behavior using microbenchmarks that we crafted for the purpose. We study the IPU's memory organization and performance. We study the latency and bandwidth that the on-chip and off-chip interconnects offer, both in point-to-point transfers and in a spectrum of collective operations, under diverse loads. We evaluate the IPU's compute power over matrix multiplication, convolution, and AI/ML primitives. We discuss actual performance in comparison with its theoretical limits. Our findings reveal how the IPU's architectural design affects its performance. Moreover, they offer simple mental models to predict an application's performance on the IPU, on the basis of the computation and communication steps it involves. This report is the natural extension to a novel architecture of a continuing effort of ours that focuses on the microbenchmark-based discovery of massively parallel architectures.

READ FULL TEXT

page 9

page 10

page 12

page 26

page 27

page 28

page 29

page 39

research
09/28/2020

Breaking the Memory Wall for AI Chip with a New Dimension

Recent advancements in deep learning have led to the widespread adoption...
research
12/08/2020

The Why, What and How of Artificial General Intelligence Chip Development

The AI chips increasingly focus on implementing neural computing at low ...
research
01/12/2023

Accordion: A Communication-Aware Machine Learning Framework for Next Generation Networks

In this article, we advocate for the design of ad hoc artificial intelli...
research
10/14/2019

Characterizing Deep Learning Training Workloads on Alibaba-PAI

Modern deep learning models have been exploited in various domains, incl...
research
10/14/2021

Bandwidth Utilization Side-Channel on ML Inference Accelerators

Accelerators used for machine learning (ML) inference provide great perf...
research
03/06/2020

Bundle Adjustment on a Graph Processor

Graph processors such as Graphcore's Intelligence Processing Unit (IPU) ...
research
10/19/2022

Scalable Coherent Optical Crossbar Architecture using PCM for AI Acceleration

Optical computing has been recently proposed as a new compute paradigm t...

Please sign up or login with your details

Forgot password? Click here to reset