DeepAI AI Chat
Log In Sign Up

A Synopses Data Engine for Interactive Extreme-Scale Analytics

03/21/2020
by   Antonis Kontaxakis, et al.
0

In this work, we detail the design and structure of a Synopses Data Engine (SDE) which combines the virtues of parallel processing and stream summarization towards delivering interactive analytics at extreme scale. Our SDE is built on top of Apache Flink and implements a synopsis-as-a-service paradigm. In that it achieves (a) concurrently maintaining thousands of synopses of various types for thousands of streams on demand, (b) reusing maintained synopses among various concurrent workflows, (c) providing data summarization facilities even for cross-(Big Data) platform workflows, (d) pluggability of new synopses on-the-fly, (e) increased potential for workflow execution optimization. The proposed SDE is useful for interactive analytics at extreme scales because it enables (i) enhanced horizontal scalability, i.e., not only scaling out the computation to a number of processing units available in a computer cluster, but also harnessing the processing load assigned to each by operating on carefully-crafted data summaries, (ii) vertical scalability, i.e., scaling the computation to very high numbers of processed streams and (iii) federated scalability i.e., scaling the computation beyond single clusters and clouds by controlling the communication required to answer global queries posed over a number of potentially geo-dispersed clusters.

READ FULL TEXT

page 2

page 5

page 7

page 8

page 9

page 10

page 11

page 12

12/14/2022

Towards Interactive, Adaptive and Result-aware Big Data Analytics

As data volumes grow across applications, analytics of large amounts of ...
07/06/2020

Multi-tenant Pub/Sub Processing for Real-time Data Streams

Devices and sensors generate streams of data across a diversity of locat...
12/27/2016

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

Big data trend has enforced the data-centric systems to have continuous ...
05/16/2017

Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine

Real-time processing of data streams emanating from sensors is becoming ...
07/09/2018

Scaling-Up Reasoning and Advanced Analytics on BigData

BigDatalog is an extension of Datalog that achieves performance and scal...
12/10/2018

Scaling-Up In-Memory Datalog Processing: Observations and Techniques

Recursive query processing has experienced a recent resurgence, as a res...
02/04/2020

Providing Insights for Queries affected by Failures and Stragglers

Interactive time responses are a crucial requirement for users analyzing...