
10 Years Later: Cloud Computing is Closing the Performance Gap
Large scale modeling and simulation problems, from nanoscale materials t...
read it

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to StructureBased Protein Function Prediction
Understanding protein structurefunction relationships is a key challeng...
read it

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly
One of the most computationally intensive tasks in computational biology...
read it

CommunicationAvoiding and MemoryConstrained Sparse MatrixMatrix Multiplication at Extreme Scale
Sparse matrixmatrix multiplication (SpGEMM) is a widely used kernel in ...
read it

Distributed ManytoMany Protein Sequence Alignment using Sparse Matrices
Identifying similar protein sequences is a core step in many computation...
read it

Reducing Communication in Graph Neural Network Training
Graph Neural Networks (GNNs) are powerful and flexible neural networks t...
read it

Optimizing High Performance Markov Clustering for PreExascale Architectures
HipMCL is a highperformance distributed memory implementation of the po...
read it

LOGAN: HighPerformance GPUBased XDrop LongRead Alignment
Pairwise sequence alignment is one of the most computationally intensive...
read it

diBELLA: Distributed Long Read to Long Read Alignment
We present a parallel algorithm and scalable implementation for genome a...
read it

The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing con...
read it

A HighThroughput Solver for Marginalized Graph Kernels on GPU
We present the design of a solver for the efficient and highthroughput ...
read it

RDMA vs. RPC for Implementing Distributed Data Structures
Distributed data structures are key to implementing scalable application...
read it

GraphBLAST: A HighPerformance Linear Algebrabased Graph Framework on the GPU
Highperformance implementations of graph algorithms are challenging to ...
read it

BCL: A CrossPlatform Distributed Container Library
Onesided communication is a useful paradigm for irregular parallel appl...
read it

Extreme Scale De Novo Metagenome Assembly
Metagenome assembly is the process of transforming a set of short, overl...
read it

Implementing PushPull Efficiently in GraphBLAS
We factor Beamer's pushpull, also known as directionoptimized breadth...
read it

Highperformance sparse matrixmatrix products on Intel KNL and multicore architectures
Sparse matrixmatrix multiplication (SpGEMM) is a computational primitiv...
read it

Design Principles for Sparse Matrix Multiplication on the GPU
We implement two novel algorithms for sparsematrix densematrix multipl...
read it

A distributedmemory approximation algorithm for maximum weight perfect bipartite matching
We design and implement an efficient parallel approximation algorithm fo...
read it

Integrated Model, Batch and Domain Parallelism in Training Neural Networks
We propose a new integrated method of exploiting model, batch and domain...
read it

Integrated Model and Data Parallelism in Training Neural Networks
We propose a new integrated method of exploiting both model and data par...
read it

CommunicationAvoiding Optimization Methods for MassiveScale Graphical Model Structure Learning
Undirected graphical models compactly represent the structure of large, ...
read it

The Reverse CuthillMcKee Algorithm in DistributedMemory
Ordering vertices of a graph is key to minimize fillin and data structu...
read it

Mathematical Foundations of the GraphBLAS
The GraphBLAS standard (GraphBlas.org) is being developed to bring the p...
read it
Aydin Buluc
is this you? claim profile