
Exploring Multidimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Deep Neural Networks have gained significant attraction due to their wid...
read it

Union: A Unified HWSW CoDesign Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
To meet the extreme compute demands for deep learning across commercial ...
read it

AIRCHITECT: Learning Custom Architecture Design and Mapping Space
Design space exploration is an important but costly step involved in the...
read it

An Optimized Dataflow for Mitigating Attention Performance Bottlenecks
Attention mechanisms form the backbone of stateoftheart machine learn...
read it

Evaluating Spatial Accelerator Architectures with Tiled MatrixMatrix Multiplication
There is a growing interest in custom spatial accelerators for machine l...
read it

Domainspecific Genetic Algorithm for Multitenant DNNAccelerator Scheduling
As Deep Learning continues to drive a variety of applications in datacen...
read it

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats
Sparsity, which occurs in both scientific applications and Deep Learning...
read it

A Taxonomy for Classification and Comparison of Dataflows for GNN Accelerators
Recently, Graph Neural Networks (GNNs) have received a lot of interest b...
read it

SelfAdaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration
With increasing diversity in Deep Neural Network(DNN) models in terms of...
read it

Architecture, Dataflow and Physical Design Implications of 3DICs for DNNAccelerators
The everlasting demand for higher computing power for deep neural networ...
read it

DataflowArchitecture CoDesign for 2.5D DNN Accelerators using Wireless NetworkonPackage
Deep neural network (DNN) models continue to grow in size and complexity...
read it

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
DNN accelerators provide efficiency by leveraging reuse of activations/w...
read it

CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices
Recent advancements in machine learning algorithms, especially the devel...
read it

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Using multiple nodes and parallel computing algorithms has become a prin...
read it

Breaking Barriers: Maximizing Array Utilization for Compute InMemory Fabrics
Compute inmemory (CIM) is a promising technique that minimizes data tra...
read it

The gem5 Simulator: Version 20.0+
The opensource and communitysupported gem5 simulator is one of the mos...
read it

Efficient Communication Acceleration for NextGen Scaleup Deep Learning Training Platforms
Deep Learning (DL) training platforms are built by interconnecting multi...
read it

Efficient Communication Acceleration for NextGenScaleup Deep Learning Training Platforms
Deep Learning (DL) training platforms are built by interconnecting multi...
read it

STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators
The design of specialized architectures for accelerating the inference p...
read it

Conditional Neural Architecture Search
Designing resourceefficient Deep Neural Networks (DNNs) is critical to ...
read it

Generative Design of Hardwareaware DNNs
To efficiently run DNNs on the edge/cloud, many new DNN inference accele...
read it

Marvel: A Datacentric Compiler for DNN Operators on Spatial Accelerators
The efficiency of a spatial DNN accelerator depends heavily on the compi...
read it

MARVEL: A Decoupled Modeldriven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators
The efficiency of a spatial DNN accelerator depends heavily on the compi...
read it

CoExploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks
Neural Architecture Search (NAS) has demonstrated its power on various A...
read it

Understanding the Impact of Onchip Communication on DNN Accelerator Performance
Deep Neural Networks have flourished at an unprecedented pace in recent ...
read it

HERALD: Optimizing Heterogeneous DNN Accelerators for Edge Devices
Recent advances in deep neural networks (DNNs) have made DNNs the backbo...
read it

Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization
Applying Machine Learning (ML) techniques to design and optimize compute...
read it

SCALESim: Systolic CNN Accelerator Simulator
Systolic Arrays are one of the most popular compute substrates within De...
read it

SCALESim: Systolic CNN Accelerator
Systolic Arrays are one of the most popular compute substrates within De...
read it

GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware
Modern deep learning systems rely on (a) a handtuned neural network top...
read it

MAESTRO: An Opensource Infrastructure for Modeling Dataflows within Deep Learning Accelerators
We present MAESTRO, a framework to describe and analyze CNN dataflows, a...
read it

Performance Implications of NoCs on 3DStacked Memories: Insights from the Hybrid Memory Cube
Memories that exploit threedimensional (3D)stacking technology, which ...
read it

FASHION: FaultAware SelfHealing Intelligent Onchip Network
To avoid packet loss and deadlock scenarios that arise due to faults or ...
read it

VESPA: VIPT Enhancements for Superpage Accesses
L1 caches are critical to the performance of modern computer systems. Th...
read it
Tushar Krishna
is this you? claim profile