
10 Years Later: Cloud Computing is Closing the Performance Gap
Large scale modeling and simulation problems, from nanoscale materials t...
PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to StructureBased Protein Function Prediction
Understanding protein structurefunction relationships is a key challeng...
Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly
One of the most computationally intensive tasks in computational biology...
CommunicationAvoiding and MemoryConstrained Sparse MatrixMatrix Multiplication at Extreme Scale
Sparse matrixmatrix multiplication (SpGEMM) is a widely used kernel in ...
Distributed ManytoMany Protein Sequence Alignment using Sparse Matrices
Identifying similar protein sequences is a core step in many computation...
Reducing Communication in Graph Neural Network Training
Graph Neural Networks (GNNs) are powerful and flexible neural networks t...
Optimizing High Performance Markov Clustering for PreExascale Architectures
HipMCL is a highperformance distributed memory implementation of the po...
LOGAN: HighPerformance GPUBased XDrop LongRead Alignment
Pairwise sequence alignment is one of the most computationally intensive...
diBELLA: Distributed Long Read to Long Read Alignment
We present a parallel algorithm and scalable implementation for genome a...
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing con...
A HighThroughput Solver for Marginalized Graph Kernels on GPU
We present the design of a solver for the efficient and highthroughput ...
RDMA vs. RPC for Implementing Distributed Data Structures
Distributed data structures are key to implementing scalable application...
GraphBLAST: A HighPerformance Linear Algebrabased Graph Framework on the GPU
Highperformance implementations of graph algorithms are challenging to ...
BCL: A CrossPlatform Distributed Container Library
Onesided communication is a useful paradigm for irregular parallel appl...
Extreme Scale De Novo Metagenome Assembly
Metagenome assembly is the process of transforming a set of short, overl...
Implementing PushPull Efficiently in GraphBLAS
We factor Beamer's pushpull, also known as directionoptimized breadth...
Highperformance sparse matrixmatrix products on Intel KNL and multicore architectures
Sparse matrixmatrix multiplication (SpGEMM) is a computational primitiv...
Design Principles for Sparse Matrix Multiplication on the GPU
We implement two novel algorithms for sparsematrix densematrix multipl...
A distributedmemory approximation algorithm for maximum weight perfect bipartite matching
We design and implement an efficient parallel approximation algorithm fo...
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
We propose a new integrated method of exploiting model, batch and domain...
Integrated Model and Data Parallelism in Training Neural Networks
We propose a new integrated method of exploiting both model and data par...
CommunicationAvoiding Optimization Methods for MassiveScale Graphical Model Structure Learning
Undirected graphical models compactly represent the structure of large, ...
The Reverse CuthillMcKee Algorithm in DistributedMemory
Ordering vertices of a graph is key to minimize fillin and data structu...
Mathematical Foundations of the GraphBLAS
The GraphBLAS standard (GraphBlas.org) is being developed to bring the p...
Aydin Buluc
