The Internals of the Data Calculator

by   Stratos Idreos, et al.

Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.


Learning Key-Value Store Design

We introduce the concept of design continuums for the data layout of key...

Data Structure Primitives on Persistent Memory: An Evaluation

Persistent Memory (PM), as already available e.g. with Intel Optane DC P...

Lower Bounds on Retroactive Data Structures

We prove essentially optimal fine-grained lower bounds on the gap betwee...

Monotonically relaxing concurrent data-structure semantics for performance: An efficient 2D design framework

There has been a significant amount of work in the literature proposing ...

"LOADS of Space": Local Order Agnosticism and Bit Flip Efficient Data Structure Codes

Algorithms, data structures, coding techniques, and other methods that r...

DRAGON (Differentiable Graph Execution) : A suite of Hardware Simulation and Optimization tools for Modern AI/Non-AI Workloads

We introduce DRAGON, an open-source, fast and explainable hardware simul...

Light-weight Locks

In this paper, we propose a new approach to building synchronization pri...

Please sign up or login with your details

Forgot password? Click here to reset