A Big Data Lake for Multilevel Streaming Analytics

09/25/2020
by   Ruoran Liu, et al.
0

Large organizations are seeking to create new architectures and scalable platforms to effectively handle data management challenges due to the explosive nature of data rarely seen in the past. These data management challenges are largely posed by the availability of streaming data at high velocity from various sources in multiple formats. The changes in data paradigm have led to the emergence of new data analytics and management architecture. This paper focuses on storing high volume, velocity and variety data in the raw formats in a data storage architecture called a data lake. First, we present our study on the limitations of traditional data warehouses in handling recent changes in data paradigms. We discuss and compare different open source and commercial platforms that can be used to develop a data lake. We then describe our end-to-end data lake design and implementation approach using the Hadoop Distributed File System (HDFS) on the Hadoop Data Platform (HDP). Finally, we present a real-world data lake development use case for data stream ingestion, staging, and multilevel streaming analytics which combines structured and unstructured data. This study can serve as a guide for individuals or organizations planning to implement a data lake solution for their use cases.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

07/15/2019

A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning

The rapid growth of data in velocity, volume, value, variety, and veraci...
12/11/2018

A Scalable and Robust Framework for Data Stream Ingestion

An essential part of building a data-driven organization is the ability ...
08/07/2017

Real Time Analytics: Algorithms and Systems

Velocity is one of the 4 Vs commonly used to characterize Big Data. In t...
05/14/2017

A Proposed Architecture for Big Data Driven Supply Chain Analytics

Advancement in information and communication technology (ICT) has given ...
03/23/2018

GreyCat: Efficient What-If Analytics for Data in Motion at Scale

Over the last few years, data analytics shifted from a descriptive era, ...
05/02/2018

Online and Offline Analysis of Streaming Data

Online and offline analytics have been traditionally treated separately ...
12/10/2016

Data Curation APIs

Understanding and analyzing big data is firmly recognized as a powerful ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.