
Learning Combinatorial Node Labeling Algorithms
We present a graph neural network to learn graph coloring heuristics usi...
read it

Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs
Determining I/O lower bounds is a crucial step in obtaining communicatio...
read it

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
The growing energy and performance costs of deep learning have driven th...
read it

Clairvoyant Prefetching for Distributed Machine Learning I/O
I/O is emerging as a major bottleneck for machine learning training, esp...
read it

Deep Data Flow Analysis
Compiler architects increasingly look to machine learning when building ...
read it

Parametric Graph Templates: Properties and Algorithms
Hierarchical structure and repetition are prevalent in graphs originatin...
read it

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems
Spatial computing devices have been shown to significantly accelerate st...
read it

SubstreamCentric Maximum Matchings on FPGA
Developing highperformance and energyefficient algorithms for maximum ...
read it

On the Parallel I/O Optimality of Linear Algebra Kernels: NearOptimal LU Factorization
Dense linear algebra kernels, such as linear solvers or tensor contracti...
read it

Data Movement Is All You Need: A Case Study on Optimizing Transformers
Transformers have become widely used for language modeling and sequence ...
read it

Deep Learning for PostProcessing Ensemble Weather Forecasts
Quantifying uncertainty in weather forecasts typically employs ensemble ...
read it

Breaking (Global) Barriers in Parallel Stochastic Optimization with WaitAvoiding Group Averaging
Deep learning at scale is dominated by communication time. Distributing ...
read it

ProGraML: Graphbased Deep Learning for Program Optimization and Analysis
The increasing complexity of computing systems places a tremendous burde...
read it

Optimizing the Data Movement in Quantum Transport Simulations via DataCentric Parallel Programming
Designing efficient cooling systems for integrated circuits (ICs) relies...
read it

A DataCentric Approach to ExtremeScale Ab initio Dissipative Quantum Transport Simulations
The computational efficiency of a state of the art ab initio quantum tra...
read it

Predicting Weather Uncertainty with Deep Convnets
Modern weather forecast models perform uncertainty quantification using ...
read it

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
Load imbalance pervasively exists in distributed deep learning training ...
read it

Stateful Dataflow Multigraphs: A DataCentric Model for Performance Portability on Heterogeneous Architectures
The ubiquity of accelerators in highperformance computing has driven pr...
read it

Stateful Dataflow Multigraphs: A DataCentric Model for HighPerformance Parallel Programs
With the ubiquity of accelerators, such as FPGAs and GPUs, the complexit...
read it

Graph Processing on FPGAs: Taxonomy, Survey, Challenges
Graph processing has become an important part of various areas, such as ...
read it

A Modular Benchmarking Infrastructure for HighPerformance and Reproducible Deep Learning
We introduce Deep500: the first customizable benchmarking infrastructure...
read it

Augment your batch: better training with larger batches
Largebatch SGD is important for scaling training of deep neural network...
read it

Neural Code Comprehension: A Learnable Representation of Code Semantics
With the recent success of embeddings in natural language processing, re...
read it

μcuDNN: Accelerating Deep Learning Frameworks with MicroBatching
NVIDIA cuDNN is a lowlevel library that provides GPU kernels frequently...
read it

Demystifying Parallel and Distributed Deep Learning: An InDepth Concurrency Analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern com...
read it
Tal BenNun
is this you? claim profile