Towards a Generic Multimodal Architecture for Batch and Streaming Big Data Integration

08/09/2021
by   Siham Yousfi, et al.
7

Big Data are rapidly produced from various heterogeneous data sources. They are of different types (text, image, video or audio) and have different levels of reliability and completeness. One of the most interesting architectures that deal with the large amount of emerging data at high velocity is called the lambda architecture. In fact, it combines two different processing layers namely batch and speed layers, each providing specific views of data while ensuring robustness, fast and scalable data processing. However, most papers dealing with the lambda architecture are focusing one single type of data generally produced by a single data source. Besides, the layers of the architecture are implemented independently, or, at best, are combined to perform basic processing without assessing either the data reliability or completeness. Therefore, inspired by the lambda architecture, we propose in this paper a generic multimodal architecture that combines both batch and streaming processing in order to build a complete, global and accurate insight in near-real-time based on the knowledge extracted from multiple heterogeneous Big Data sources. Our architecture uses batch processing to analyze the data structures and contents, build the learning models and calculate the reliability index of the involved sources, while the streaming processing uses the built-in models of the batch layer to immediately process incoming data and rapidly provide results. We validate our architecture in the context of urban traffic management systems in order to detect congestions.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 10

research
09/12/2018

An Approach to Handle Big Data Warehouse Evolution

One of the purposes of Big Data systems is to support analysis of data g...
research
02/28/2018

Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

Apache Calcite is a foundational software framework that provides query ...
research
06/18/2018

AlertMix: A Big Data platform for multi-source streaming data

The demand for stream processing is increasing at an unprecedented rate....
research
08/25/2022

Mask-Mediator-Wrapper: A revised mediator-wrapper architecture for heterogeneous data source integration

This paper deals with the mediator-wrapper architecture. It is an import...
research
01/25/2019

A quality model for evaluating and choosing a stream processing framework architecture

Today, we have to deal with many data (Big data) and we need to make dec...
research
10/24/2019

Toward a view-based data cleaning architecture

Big data analysis has become an active area of study with the growth of ...
research
05/02/2018

Architecture for Analysis of Streaming Data

While several attempts have been made to construct a scalable and flexib...

Please sign up or login with your details

Forgot password? Click here to reset