In-Datacenter Performance Analysis of a Tensor Processing Unit

04/16/2017
∙
by   Norman P. Jouppi, et al.
∙
0
∙

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95 NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.

READ FULL TEXT

page 3

page 4

research
∙ 08/16/2017

Improving Multi-Application Concurrency Support Within the GPU Memory System

GPUs exploit a high degree of thread-level parallelism to hide long-late...
research
∙ 05/21/2018

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for ...
research
∙ 09/19/2022

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Neural networks (NNs) are growing in importance and complexity. A neural...
research
∙ 03/29/2021

Large-Scale Approximate k-NN Graph Construction on GPU

k-nearest neighbor graph is a key data structure in many disciplines suc...
research
∙ 02/13/2020

CSM-NN: Current Source Model Based Logic Circuit Simulation – A Neural Network Approach

The miniaturization of transistors down to 5nm and beyond, plus the incr...
research
∙ 03/18/2019

Dissecting the NVidia Turing T4 GPU via Microbenchmarking

In 2019, the rapid rate at which GPU manufacturers refresh their designs...
research
∙ 11/20/2016

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels a...

Please sign up or login with your details

Forgot password? Click here to reset