On data lake architectures and metadata management

07/23/2021
by   Pegdwendé Sawadogo, et al.
0

Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are mainly characterized by volume, velocity, variety and veracity issues. Big data-related issues strongly challenge traditional data management and analysis systems. The concept of data lake was introduced to address them. A data lake is a large, raw data repository that stores and manages all company data bearing any format. However, the data lake concept remains ambiguous or fuzzy for many researchers and practitioners, who often confuse it with the Hadoop technology. Thus, we provide in this paper a comprehensive state of the art of the different approaches to data lake design. We particularly focus on data lake architectures and metadata management, which are key issues in successful data lakes. We also discuss the pros and cons of data lakes and their design alternatives.

READ FULL TEXT
research
09/20/2019

Metadata Systems for Data Lakes: Models and Features

Over the past decade, the data lake concept has emerged as an alternativ...
research
12/15/2020

Big Data

The Internet of Things, crowdsourcing, social media, public authorities,...
research
07/05/2021

Data Lake Ingestion Management

Data Lake (DL) is a Big Data analysis solution which ingests raw data in...
research
07/23/2014

Using 3D Printing to Visualize Social Media Big Data

Big data volume continues to grow at unprecedented rates. One of the key...
research
06/26/2018

An Efficient Data Warehouse for Crop Yield Prediction

Nowadays, precision agriculture combined with modern information and com...
research
12/17/2018

Report on Data Quality in Biobanks: Problems, Issues, State-of-the-Art

This report discusses the issues of data quality in biobanks. It present...
research
03/02/2021

Technical Report on Data Integration and Preparation

AI application developers typically begin with a dataset of interest and...

Please sign up or login with your details

Forgot password? Click here to reset