A Scalable and Robust Framework for Data Stream Ingestion

12/11/2018
by   Haruna Isah, et al.
0

An essential part of building a data-driven organization is the ability to handle and process continuous streams of data to discover actionable insights. The explosive growth of interconnected devices and the social Web has led to a large volume of data being generated on a continuous basis. Streaming data sources such as stock quotes, credit card transactions, trending news, traffic conditions, time-sensitive patients data are not only very common but can rapidly depreciate if not processed quickly. The ever-increasing volume and highly irregular nature of data rates pose new challenges to data stream processing systems. One such challenging but important task is how to accurately ingest and integrate data streams from various sources and locations into an analytics platform. These challenges demand new strategies and systems that can offer the desired degree of scalability and robustness in handling failures. This paper investigates the fundamental requirements and the state of the art of existing data stream ingestion systems, propose a scalable and fault-tolerant data stream ingestion and integration framework that can serve as a reusable component across many feeds of structured and unstructured input data in a given platform, and demonstrate the utility of the framework in a real-world data stream processing case study that integrates Apache NiFi and Kafka for processing high velocity news articles from across the globe. The study also identifies best practices and gaps for future research in developing large-scale data stream processing infrastructure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2020

A Big Data Lake for Multilevel Streaming Analytics

Large organizations are seeking to create new architectures and scalable...
research
04/02/2021

ESTemd: A Distributed Processing Framework for Environmental Monitoring based on Apache Kafka Streaming Engine

Distributed networks and real-time systems are becoming the most importa...
research
01/28/2019

A Comprehensive Survey on Parallelization and Elasticity in Stream Processing

Stream Processing (SP) has evolved as the leading paradigm to process an...
research
11/15/2019

Scalable and Reliable Multi-Dimensional Aggregation of Sensor Data Streams

Ever-increasing amounts of data and requirements to process them in real...
research
09/24/2019

An Exploratory Study of How Specialists Deal with Testing in Data Stream Processing Applications

[Background] Nowadays, there is a massive growth of data volume and spee...
research
04/30/2022

A Framework for Simulating Real-world Stream Data of the Internet of Things

With the rapid growth in the number of devices of the Internet of Things...
research
02/04/2022

Twitter Referral Behaviours on News Consumption with Ensemble Clustering of Click-Stream Data in Turkish Media

Click-stream data, which comes with a massive volume generated by the hu...

Please sign up or login with your details

Forgot password? Click here to reset