BAD to the Bone: Big Active Data at its Core

02/22/2020
by   Steven Jacobs, et al.
0

Virtually all of today's Big Data systems are passive in nature, responding to queries posted by their users. Instead, we are working to shift Big Data platforms from passive to active. In our view, a Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as supporting retrospective analyses of historical information. While various scalable streaming query engines have been created, their active behavior is limited to a (relatively) small window of the incoming data. To this end we have created a BAD platform that combines ideas and capabilities from both Big Data and Active Data (e.g., Publish/Subscribe, Streaming Engines). It supports complex subscriptions that consider not only newly arrived items but also their relationships to past, stored data. Further, it can provide actionable notifications by enriching the subscription results with other useful data. Our platform extends an existing open-source Big Data Management System, Apache AsterixDB, with an active toolkit. The toolkit contains features to rapidly ingest semistructured data, share execution pipelines among users, manage scaled user data subscriptions, and actively monitor the state of the data to produce individualized information for each user. This paper describes the features and design of our current BAD data platform and demonstrates its ability to scale without sacrificing query capabilities or result individualization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2020

Subscribing to Big Data at Scale

Today, data is being actively generated by a variety of devices, service...
research
01/06/2021

Bridging BAD Islands: Declarative Data Sharing at Scale

In many Big Data applications today, information needs to be actively sh...
research
02/11/2019

Scaling Big Data Platform for Big Data Pipeline

Monitoring and Managing High Performance Computing (HPC) systems and env...
research
10/22/2018

biggy: An Implementation of Unified Framework for Big Data Management System

Various tools, softwares and systems are proposed and implemented to tac...
research
12/19/2017

Passive ans Active Observation: Experimetal Design Issues in Big Data

Data can be collected in scientific studies via a controlled experiment ...
research
04/26/2021

Cloud computing as a platform for monetizing data services: A two-sided game business model

With the unprecedented reliance on cloud computing as the backbone for s...
research
07/30/2019

A performance comparison of Dask and Apache Spark for data-intensive neuroimaging pipelines

In the past few years, neuroimaging has entered the Big Data era due to ...

Please sign up or login with your details

Forgot password? Click here to reset