
SISA: SetCentric Instruction Set Architecture for Graph Mining on ProcessinginMemory Systems
Simple graph algorithms such as PageRank have recently been the target o...
GraphMineSuite: Enabling HighPerformance and Programmable Graph Mining Algorithms with Set Algebra
We propose GraphMineSuite (GMS): the first benchmarking suite for graph ...
SeBS: A Serverless Benchmark Suite for FunctionasaService Computing
FunctionasaService (FaaS) is one of the most promising directions for...
The Future is Big Graphs! A Community View on Graph Processing Systems
Graphs are by nature unifying abstractions that can leverage interconnec...
To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations
We reduce the cost of communication and synchronization in graph process...
Log(Graph): A NearOptimal HighPerformance Graph Representation
Today's graphs used in domains such as machine learning or social networ...
SubstreamCentric Maximum Matchings on FPGA
Developing highperformance and energyefficient algorithms for maximum ...
Slim NoC: A LowDiameter OnChip Network Topology for High Energy Efficiency and Scalability
Emerging chips with hundreds and thousands of cores require networks wit...
SlimSell: A Vectorizable Graph Representation for BreadthFirst Search
Vectorization and GPUs will profoundly change graph processing. Traditio...
HighPerformance Distributed RMA Locks
We propose a topologyaware distributed ReaderWriter lock that accelera...
Evaluating the Cost of Atomic Operations on Modern Architectures
Atomic operations (atomics) such as CompareandSwap (CAS) or Fetchand...
Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages
We propose Atomic Active Messages (AAM), a mechanism that accelerates ir...
Fault Tolerance for Remote Memory Access Programming Models
Remote Memory Access (RMA) is an emerging mechanism for programming high...
On the Parallel I/O Optimality of Linear Algebra Kernels: NearOptimal LU Factorization
Dense linear algebra kernels, such as linear solvers or tensor contracti...
HighPerformance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality
We develop the first parallel graph coloring heuristics with strong theo...
HighPerformance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks
The recent line of research into topology design focuses on lowering net...
Enabling HighlyScalable Remote Memory Access Programming with MPI3 One Sided
Modern interconnects offer remote direct memory access (RDMA) features. ...
Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism
Graph processing has become an important part of various areas of comput...
Slim Fly: A Cost Effective LowDiameter Network Topology
We introduce a highperformance costeffective network topology called S...
Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics
We propose Slim Graph: the first programming model and framework for pra...
CommunicationEfficient Jaccard Similarity for HighPerformance Distributed Genome Comparisons
Jaccard Similarity index is an important measure of the overlap of two s...
Active Access: A Mechanism for HighPerformance Distributed DataCentric Computations
Remote memory access (RMA) is an emerging highperformance programming m...
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
Graph processing has become an important part of multiple areas of compu...
Redblue pebbling revisited: near optimal parallel matrixmatrix multiplication
We propose COSMA: a parallel matrixmatrix multiplication algorithm that...
NetworkAccelerated NonContiguous Memory Transfers
Applications often communicate data that is noncontiguous in the send ...
FatPaths: Routing in Supercomputers, Data Centers, and Clouds with LowDiameter Networks when Shortest Paths Fall Short
We introduce FatPaths: a simple, generic, and robust routing architectur...
Graph Processing on FPGAs: Taxonomy, Survey, Challenges
Graph processing has become an important part of various areas, such as ...
A Modular Benchmarking Infrastructure for HighPerformance and Reproducible Deep Learning
We introduce Deep500: the first customizable benchmarking infrastructure...
Survey and Taxonomy of Lossless Graph Compression and SpaceEfficient Graph Representations
Various graphs such as web or social networks may contain up to trillion...
Scaling betweenness centrality using communicationefficient sparse matrix multiplication
Betweenness centrality (BC) is a crucial graph problem that measures the...
