A Synopses Data Engine for Interactive Extreme-Scale Analytics

03/21/2020
by   Antonis Kontaxakis, et al.
0

In this work, we detail the design and structure of a Synopses Data Engine (SDE) which combines the virtues of parallel processing and stream summarization towards delivering interactive analytics at extreme scale. Our SDE is built on top of Apache Flink and implements a synopsis-as-a-service paradigm. In that it achieves (a) concurrently maintaining thousands of synopses of various types for thousands of streams on demand, (b) reusing maintained synopses among various concurrent workflows, (c) providing data summarization facilities even for cross-(Big Data) platform workflows, (d) pluggability of new synopses on-the-fly, (e) increased potential for workflow execution optimization. The proposed SDE is useful for interactive analytics at extreme scales because it enables (i) enhanced horizontal scalability, i.e., not only scaling out the computation to a number of processing units available in a computer cluster, but also harnessing the processing load assigned to each by operating on carefully-crafted data summaries, (ii) vertical scalability, i.e., scaling the computation to very high numbers of processed streams and (iii) federated scalability i.e., scaling the computation beyond single clusters and clouds by controlling the communication required to answer global queries posed over a number of potentially geo-dispersed clusters.

READ FULL TEXT

page 2

page 5

page 7

page 8

page 9

page 10

page 11

page 12

research
12/14/2022

Towards Interactive, Adaptive and Result-aware Big Data Analytics

As data volumes grow across applications, analytics of large amounts of ...
research
07/06/2020

Multi-tenant Pub/Sub Processing for Real-time Data Streams

Devices and sensors generate streams of data across a diversity of locat...
research
12/27/2016

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

Big data trend has enforced the data-centric systems to have continuous ...
research
05/16/2017

Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine

Real-time processing of data streams emanating from sensors is becoming ...
research
07/09/2018

Scaling-Up Reasoning and Advanced Analytics on BigData

BigDatalog is an extension of Datalog that achieves performance and scal...
research
07/30/2018

To Ship or Not to (Function) Ship (Extended version)

Sampling is often used to reduce query latency for interactive big data ...
research
02/04/2020

Providing Insights for Queries affected by Failures and Stragglers

Interactive time responses are a crucial requirement for users analyzing...

Please sign up or login with your details

Forgot password? Click here to reset