A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

05/26/2021
by   Dan Zhang, et al.
0

The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full-stack Accelerator Search Technique (FAST), a hardware accelerator search framework that defines a broad optimization environment covering key design decisions within the hardware-software stack, including hardware datapath, software scheduling, and compiler passes such as operation fusion and tensor padding. In this paper, we analyze bottlenecks in state-of-the-art vision and natural language processing (NLP) models, including EfficientNet and BERT, and use FAST to design accelerators capable of addressing these bottlenecks. FAST-generated accelerators optimized for single workloads improve Perf/TDP by 3.7x on average across all benchmarks compared to TPU-v3. A FAST-generated accelerator optimized for serving a suite of workloads improves Perf/TDP by 2.4x on average compared to TPU-v3. Our return on investment analysis shows that FAST-generated accelerators can potentially be practical for moderate-sized datacenter deployments.

READ FULL TEXT
research
07/11/2018

VTA: An Open Hardware-Software Stack for Deep Learning

Hardware acceleration is an enabler for ubiquitous and efficient deep le...
research
08/11/2023

Code Transpilation for Hardware Accelerators

DSLs and hardware accelerators have proven to be very effective in optim...
research
12/04/2021

Understanding the Limits of Conventional Hardware Architectures for Deep-Learning

Deep learning and hardware for it has garnered immense academic and indu...
research
11/10/2016

In-Storage Embedded Accelerator for Sparse Pattern Processing

We present a novel architecture for sparse pattern processing, using fla...
research
12/20/2021

Dijkstra-Through-Time: Ahead of time hardware scheduling method for deterministic workloads

Most of the previous works on data flow optimizations for Machine Learni...
research
06/09/2023

KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow

Dataflow scheduling decisions are of vital importance to neural network ...
research
04/20/2020

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

Specialized accelerators for tensor-operations, such as blocked-matrix o...

Please sign up or login with your details

Forgot password? Click here to reset