Evolutionary Data Systems

06/18/2017
by   Stratos Idreos, et al.
0

Anyone in need of a data system today is confronted with numerous complex options in terms of system architectures, such as traditional relational databases, NoSQL and NewSQL solutions as well as several sub-categories like column-stores, row-stores etc. This overwhelming array of choices makes bootstrapping data-driven applications difficult and time consuming, requiring expertise often not accessible due to cost issues (e.g., to scientific labs or small businesses). In this paper, we present the vision of evolutionary data systems that free systems architects and application designers from the complex, cumbersome and expensive process of designing and tuning specialized data system architectures that fit only a single, static application scenario. Setting up an evolutionary system is as simple as identifying the data. As new data and queries come in, the system automatically evolves so that its architecture matches the properties of the incoming workload at all times. Inspired by the theory of evolution, at any given point in time, an evolutionary system may employ multiple competing solutions down at the low level of database architectures -- characterized as combinations of data layouts, access methods and execution strategies. Over time, "the fittest wins" and becomes the dominant architecture until the environment (workload) changes. In our initial prototype, we demonstrate solutions that can seamlessly evolve (back and forth) between a key-value store and a column-store architecture in order to adapt to changing workloads.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2021

Real-Time LSM-Trees for HTAP Workloads

Real-time data analytics systems such as SAP HANA, MemSQL, and IBM Wildf...
research
03/08/2019

Deductive Optimization of Relational Data Storage

Optimizing the physical data storage and retrieval of data are two key d...
research
03/17/2023

Autonomic Architecture for Big Data Performance Optimization

The big data software stack based on Apache Spark and Hadoop has become ...
research
09/01/2022

ByteStore: Hybrid Layouts for Main-Memory Column Stores

The performance of main memory column stores highly depends on the scan ...
research
04/17/2023

Hybrid Materialization in a Disk-Based Column-Store

In column-oriented query processing, a materialization strategy determin...
research
05/15/2020

Referencing Sources of Molecular Spectroscopic Data in the Era of Data Science: Application to the HITRAN and AMBDAS Databases

The application described has been designed to create bibliographic entr...
research
08/09/2021

"What makes my queries slow?": Subgroup Discovery for SQL Workload Analysis

Among daily tasks of database administrators (DBAs), the analysis of que...

Please sign up or login with your details

Forgot password? Click here to reset