
SISA: SetCentric Instruction Set Architecture for Graph Mining on ProcessinginMemory Systems
Simple graph algorithms such as PageRank have recently been the target o...
read it

GraphMineSuite: Enabling HighPerformance and Programmable Graph Mining Algorithms with Set Algebra
We propose GraphMineSuite (GMS): the first benchmarking suite for graph ...
read it

SeBS: A Serverless Benchmark Suite for FunctionasaService Computing
FunctionasaService (FaaS) is one of the most promising directions for...
read it

The Future is Big Graphs! A Community View on Graph Processing Systems
Graphs are by nature unifying abstractions that can leverage interconnec...
read it

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations
We reduce the cost of communication and synchronization in graph process...
read it

Log(Graph): A NearOptimal HighPerformance Graph Representation
Today's graphs used in domains such as machine learning or social networ...
read it

SubstreamCentric Maximum Matchings on FPGA
Developing highperformance and energyefficient algorithms for maximum ...
read it

Slim NoC: A LowDiameter OnChip Network Topology for High Energy Efficiency and Scalability
Emerging chips with hundreds and thousands of cores require networks wit...
read it

SlimSell: A Vectorizable Graph Representation for BreadthFirst Search
Vectorization and GPUs will profoundly change graph processing. Traditio...
read it

HighPerformance Distributed RMA Locks
We propose a topologyaware distributed ReaderWriter lock that accelera...
read it

Evaluating the Cost of Atomic Operations on Modern Architectures
Atomic operations (atomics) such as CompareandSwap (CAS) or Fetchand...
read it

Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages
We propose Atomic Active Messages (AAM), a mechanism that accelerates ir...
read it

Fault Tolerance for Remote Memory Access Programming Models
Remote Memory Access (RMA) is an emerging mechanism for programming high...
read it

On the Parallel I/O Optimality of Linear Algebra Kernels: NearOptimal LU Factorization
Dense linear algebra kernels, such as linear solvers or tensor contracti...
read it

HighPerformance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality
We develop the first parallel graph coloring heuristics with strong theo...
read it

HighPerformance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks
The recent line of research into topology design focuses on lowering net...
read it

Enabling HighlyScalable Remote Memory Access Programming with MPI3 One Sided
Modern interconnects offer remote direct memory access (RDMA) features. ...
read it

Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism
Graph processing has become an important part of various areas of comput...
read it

Slim Fly: A Cost Effective LowDiameter Network Topology
We introduce a highperformance costeffective network topology called S...
read it

Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics
We propose Slim Graph: the first programming model and framework for pra...
read it

CommunicationEfficient Jaccard Similarity for HighPerformance Distributed Genome Comparisons
Jaccard Similarity index is an important measure of the overlap of two s...
read it

Active Access: A Mechanism for HighPerformance Distributed DataCentric Computations
Remote memory access (RMA) is an emerging highperformance programming m...
read it

Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
Graph processing has become an important part of multiple areas of compu...
read it

Redblue pebbling revisited: near optimal parallel matrixmatrix multiplication
We propose COSMA: a parallel matrixmatrix multiplication algorithm that...
read it

NetworkAccelerated NonContiguous Memory Transfers
Applications often communicate data that is noncontiguous in the send ...
read it

FatPaths: Routing in Supercomputers, Data Centers, and Clouds with LowDiameter Networks when Shortest Paths Fall Short
We introduce FatPaths: a simple, generic, and robust routing architectur...
read it

Graph Processing on FPGAs: Taxonomy, Survey, Challenges
Graph processing has become an important part of various areas, such as ...
read it

A Modular Benchmarking Infrastructure for HighPerformance and Reproducible Deep Learning
We introduce Deep500: the first customizable benchmarking infrastructure...
read it

Survey and Taxonomy of Lossless Graph Compression and SpaceEfficient Graph Representations
Various graphs such as web or social networks may contain up to trillion...
read it

Scaling betweenness centrality using communicationefficient sparse matrix multiplication
Betweenness centrality (BC) is a crucial graph problem that measures the...
read it
Maciej Besta
is this you? claim profile