Cimple: Instruction and Memory Level Parallelism

07/04/2018
by   Vladimir Kiriansky, et al.
0

Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for in-flight memory requests. These resources, however, often exhibit poor utilization rates on workloads with large working sets, e.g., in-memory databases, key-value stores, and graph analytics, as compilers and hardware struggle to expose ILP and MLP from the instruction stream automatically. In this paper, we introduce the IMLP (Instruction and Memory Level Parallelism) task programming model. IMLP tasks execute as coroutines that yield execution at annotated long-latency operations, e.g., memory accesses, divisions, or unpredictable branches. IMLP tasks are interleaved on a single thread, and integrate well with thread parallelism and vectorization. Our DSL embedded in C++, Cimple, allows exploration of task scheduling and transformations, such as buffering, vectorization, pipelining, and prefetching. We demonstrate state-of-the-art performance on core algorithms used in in-memory databases that operate on arrays, hash tables, trees, and skip lists. Cimple applications reach 2.5x throughput gains over hardware multithreading on a multi-core, and 6.4x single thread speedup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2022

Comparison of methods for the calculation of the real dilogarithm regarding instruction-level parallelism

We compare different methods for the computation of the real dilogarithm...
research
08/17/2020

CARGO : Context Augmented Critical Region Offload for Network-bound datacenter Workloads

Network bound applications, like a database server executing OLTP querie...
research
04/30/2018

Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance

In a modern GPU architecture, all threads within a warp execute the same...
research
04/19/2021

Algoritmos de minería de datos en la industria sanitaria

In this paper, we review data mining approaches for health applications....
research
04/22/2022

An Evaluation of Intra-Transaction Parallelism in Actor-Relational Database Systems

Over the past decade, we have witnessed a dramatic evolution in main-mem...
research
11/13/2018

Task Graph Transformations for Latency Tolerance

The Integrative Model for Parallelism (IMP) derives a task graph from a ...
research
04/11/2017

FMMU: A Hardware-Automated Flash Map Management Unit for Scalable Performance of NAND Flash-Based SSDs

NAND flash-based Solid State Drives (SSDs), which are widely used from e...

Please sign up or login with your details

Forgot password? Click here to reset