An IDEA: An Ingestion Framework for Data Enrichment in AsterixDB

02/21/2019
by   Xikui Wang, et al.
0

Big Data today is being generated at an unprecedented rate from various sources such as sensors, applications, and devices, and it often needs to be enriched based on other existing information to support complex analytical queries. Depending on the use case, the enrichment operations can be compiled code, declarative queries, or machine learning models with different complexities. For enrichments that will be frequently used in the future, it can be advantageous to push their computation into the ingestion pipeline so that they can be stored (and queried) together with the data. In some cases, the referenced information may change over time, so the ingestion pipeline should be able to adapt to such changes to guarantee the currency and/or correctness of the enrichment results. In this paper, we present a new data ingestion framework that supports data ingestion at scale, enrichments requiring complex operations, and adaptiveness to reference data changes. We explain how this framework has been built on top of Apache AsterixDB and investigate its performance at scale under various workloads.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2019

Adaptive Learning of Aggregate Analytics under Dynamic Workloads

Large organizations have seamlessly incorporated data-driven decision ma...
research
10/28/2018

VDMS: An Efficient Big-Visual-Data Access for Machine Learning Workloads

We introduce the Visual Data Management System (VDMS), a data management...
research
10/28/2018

VDMS: Efficient Big-Visual-Data Access for Machine Learning Workloads

We introduce the Visual Data Management System (VDMS), which enables fas...
research
09/10/2020

Subscribing to Big Data at Scale

Today, data is being actively generated by a variety of devices, service...
research
12/01/2021

Processing Analytical Queries in the AWESOME Polystore [Information Systems Architectures]

Modern big data applications usually involve heterogeneous data sources ...
research
08/19/2019

AFrame: Extending DataFrames for Large-Scale Modern Data Analysis (Extended Version)

Analyzing the increasingly large volumes of data that are available toda...
research
02/12/2021

Updatable Materialization of Approximate Constraints

Modern big data applications integrate data from various sources. As a r...

Please sign up or login with your details

Forgot password? Click here to reset