
-
10 Years Later: Cloud Computing is Closing the Performance Gap
Large scale modeling and simulation problems, from nanoscale materials t...
read it
-
PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction
Understanding protein structure-function relationships is a key challeng...
read it
-
Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly
One of the most computationally intensive tasks in computational biology...
read it
-
Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale
Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in ...
read it
-
Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices
Identifying similar protein sequences is a core step in many computation...
read it
-
Reducing Communication in Graph Neural Network Training
Graph Neural Networks (GNNs) are powerful and flexible neural networks t...
read it
-
Optimizing High Performance Markov Clustering for Pre-Exascale Architectures
HipMCL is a high-performance distributed memory implementation of the po...
read it
-
LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment
Pairwise sequence alignment is one of the most computationally intensive...
read it
-
diBELLA: Distributed Long Read to Long Read Alignment
We present a parallel algorithm and scalable implementation for genome a...
read it
-
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing con...
read it
-
A High-Throughput Solver for Marginalized Graph Kernels on GPU
We present the design of a solver for the efficient and high-throughput ...
read it
-
RDMA vs. RPC for Implementing Distributed Data Structures
Distributed data structures are key to implementing scalable application...
read it
-
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to ...
read it
-
BCL: A Cross-Platform Distributed Container Library
One-sided communication is a useful paradigm for irregular parallel appl...
read it
-
Extreme Scale De Novo Metagenome Assembly
Metagenome assembly is the process of transforming a set of short, overl...
read it
-
Implementing Push-Pull Efficiently in GraphBLAS
We factor Beamer's push-pull, also known as direction-optimized breadth-...
read it
-
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitiv...
read it
-
Design Principles for Sparse Matrix Multiplication on the GPU
We implement two novel algorithms for sparse-matrix dense-matrix multipl...
read it
-
A distributed-memory approximation algorithm for maximum weight perfect bipartite matching
We design and implement an efficient parallel approximation algorithm fo...
read it
-
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
We propose a new integrated method of exploiting model, batch and domain...
read it
-
Integrated Model and Data Parallelism in Training Neural Networks
We propose a new integrated method of exploiting both model and data par...
read it
-
Communication-Avoiding Optimization Methods for Massive-Scale Graphical Model Structure Learning
Undirected graphical models compactly represent the structure of large, ...
read it
-
The Reverse Cuthill-McKee Algorithm in Distributed-Memory
Ordering vertices of a graph is key to minimize fill-in and data structu...
read it
-
Mathematical Foundations of the GraphBLAS
The GraphBLAS standard (GraphBlas.org) is being developed to bring the p...
read it