DeepAI AI Chat
Log In Sign Up

DRAGON (Differentiable Graph Execution) : A suite of Hardware Simulation and Optimization tools for Modern AI/Non-AI Workloads

by   Khushal Sethi, et al.
Stanford University

We introduce DRAGON, an open-source, fast and explainable hardware simulation and optimization toolchain that enables hardware architects to simulate hardware designs, and to optimize hardware designs to efficiently execute workloads. The DRAGON toolchain provides the following tools: Hardware Model Generator (DGen), Hardware Simulator (DSim) and Hardware Optimizer (DOpt). DSim provides the simulation of running algorithms (represented as data-flow graphs) on hardware described. DGen describes the hardware in detail, with user input architectures/technology (represented in a custom description language). A novel methodology of gradient descent from the simulation allows us optimize the hardware model (giving the directions for improvements in technology parameters and design parameters), provided by Dopt. DRAGON framework (DSim) is much faster than previously avaible works for simulation, which is possible through performance-first code writing practices, mathematical formulas for common computing operations to avoid cycle-accurate simulation steps, efficient algorithms for mapping, and data-structure representations for hardware state. DRAGON framework (Dopt) generates performance optimized architectures for both AI and Non-AI Workloads, and provides technology improvement directions for 100x-1000x better future computing systems.


page 1

page 2

page 3

page 4


ScaleSimulator: A Fast and Cycle-Accurate Parallel Simulator for Architectural Exploration

Design of next generation computer systems should be supported by simula...

VPU-EM: An Event-based Modeling Framework to Evaluate NPU Performance and Power Efficiency at Scale

State-of-art NPUs are typically architected as a self-contained sub-syst...

Certifiable Robot Design Optimization using Differentiable Programming

There is a growing need for computational tools to automatically design ...

Optimising AI Training Deployments using Graph Compilers and Containers

Artificial Intelligence (AI) applications based on Deep Neural Networks ...

The Internals of the Data Calculator

Data structures are critical in any data-driven scenario, but they are n...

SOL: Effortless Device Support for AI Frameworks without Source Code Changes

Modern high performance computing clusters heavily rely on accelerators ...

Arsenal of Hardware Prefetchers

Hardware prefetching is one of the latency tolerance optimization techni...