Modularis: Modular Data Analytics for Hardware, Software, and Platform Heterogeneity

04/07/2020
by   Dimitrios Koutsoukos, et al.
0

Today's data analytics displays an overwhelming diversity along many dimensions: data types, platforms, hardware acceleration, etc. As a result, system design often has to choose between depth and breadth: high efficiency for a narrow set of use cases or generality at a lower performance. In this paper, we pave the way to get the best of both worlds: We present Modularis-an execution layer for data analytics based on fine-grained, composable building blocks that are as generic and simple as possible. These building blocks are similar to traditional database operators, but at a finer granularity, so we call them sub-operators. Sub-operators can be freely and easily combined. As we demonstrate with concrete examples in the context of RDMA-based databases, Modularis' sub-operators can be combined to perform the same task as a complex, monolithic operator. Sub-operators, however, can be reused, can be offloaded to different layers or accelerators, and can be customized to specialized hardware. In the use cases we have tested so far, sub-operators reduce the amount of code significantly-or example, for a distributed, RDMA-based join by a factor of four-while having minimal performance overhead. Modularis is an order of magnitude faster on SQL-style analytics compared to a commonly used framework for generic data processing (Presto) and on par with a commercial cluster database (MemSQL).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2021

SecureDL: Securing Code Execution and Access Control for Distributed Data Analytics Platforms

Distributed data analytics platforms such as Apache Spark enable cost-ef...
research
05/09/2018

RHEEMix in the Data Jungle -- A Cross-Platform Query Optimizer --

In pursuit of efficient and scalable data analytics, the insight that "o...
research
12/02/2019

Lambada: Interactive Data Analytics on Cold Data using Serverless Cloud Infrastructure

The promise of ultimate elasticity and operational simplicity of serverl...
research
04/02/2020

High Bandwidth Memory on FPGAs: A Data Analytics Perspective

FPGA-based data processing in datacenters is increasing in popularity du...
research
05/27/2018

Deployment of Customized Deep Learning based Video Analytics On Surveillance Cameras

This paper demonstrates the effectiveness of our customized deep learnin...
research
01/08/2018

In-RDBMS Hardware Acceleration of Advanced Analytics

The data revolution is fueled by advances in several areas, including da...
research
10/13/2020

Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring Data

Modern High-Performance Computing (HPC) and data center operators rely m...

Please sign up or login with your details

Forgot password? Click here to reset