
Exploring Multidimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Deep Neural Networks have gained significant attraction due to their wid...
Union: A Unified HWSW CoDesign Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
To meet the extreme compute demands for deep learning across commercial ...
AIRCHITECT: Learning Custom Architecture Design and Mapping Space
Design space exploration is an important but costly step involved in the...
An Optimized Dataflow for Mitigating Attention Performance Bottlenecks
Attention mechanisms form the backbone of stateoftheart machine learn...
Evaluating Spatial Accelerator Architectures with Tiled MatrixMatrix Multiplication
There is a growing interest in custom spatial accelerators for machine l...
Domainspecific Genetic Algorithm for Multitenant DNNAccelerator Scheduling
As Deep Learning continues to drive a variety of applications in datacen...
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats
Sparsity, which occurs in both scientific applications and Deep Learning...
A Taxonomy for Classification and Comparison of Dataflows for GNN Accelerators
Recently, Graph Neural Networks (GNNs) have received a lot of interest b...
SelfAdaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration
With increasing diversity in Deep Neural Network(DNN) models in terms of...
Architecture, Dataflow and Physical Design Implications of 3DICs for DNNAccelerators
The everlasting demand for higher computing power for deep neural networ...
DataflowArchitecture CoDesign for 2.5D DNN Accelerators using Wireless NetworkonPackage
Deep neural network (DNN) models continue to grow in size and complexity...
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
DNN accelerators provide efficiency by leveraging reuse of activations/w...
CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices
Recent advancements in machine learning algorithms, especially the devel...
Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Using multiple nodes and parallel computing algorithms has become a prin...
Breaking Barriers: Maximizing Array Utilization for Compute InMemory Fabrics
Compute inmemory (CIM) is a promising technique that minimizes data tra...
The gem5 Simulator: Version 20.0+
The opensource and communitysupported gem5 simulator is one of the mos...
Efficient Communication Acceleration for NextGen Scaleup Deep Learning Training Platforms
Deep Learning (DL) training platforms are built by interconnecting multi...
STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators
The design of specialized architectures for accelerating the inference p...
Conditional Neural Architecture Search
Designing resourceefficient Deep Neural Networks (DNNs) is critical to ...
Generative Design of Hardwareaware DNNs
To efficiently run DNNs on the edge/cloud, many new DNN inference accele...
Marvel: A Datacentric Compiler for DNN Operators on Spatial Accelerators
The efficiency of a spatial DNN accelerator depends heavily on the compi...
CoExploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks
Neural Architecture Search (NAS) has demonstrated its power on various A...
Understanding the Impact of Onchip Communication on DNN Accelerator Performance
Deep Neural Networks have flourished at an unprecedented pace in recent ...
HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices
Recent advances in deep neural networks (DNNs) have made DNNs the backbo...
Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization
Applying Machine Learning (ML) techniques to design and optimize compute...
SCALESim: Systolic CNN Accelerator Simulator
Systolic Arrays are one of the most popular compute substrates within De...
GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware
Modern deep learning systems rely on (a) a handtuned neural network top...
MAESTRO: An Opensource Infrastructure for Modeling Dataflows within Deep Learning Accelerators
We present MAESTRO, a framework to describe and analyze CNN dataflows, a...
Performance Implications of NoCs on 3DStacked Memories: Insights from the Hybrid Memory Cube
Memories that exploit threedimensional (3D)stacking technology, which ...
FASHION: FaultAware SelfHealing Intelligent Onchip Network
To avoid packet loss and deadlock scenarios that arise due to faults or ...
VESPA: VIPT Enhancements for Superpage Accesses
L1 caches are critical to the performance of modern computer systems. Th...
Tushar Krishna
