Coining goldMEDAL: A New Contribution to Data Lake Generic Metadata Modeling

03/24/2021
by   Etienne Scholly, et al.
0

The rise of big data has revolutionized data exploitation practices and led to the emergence of new concepts. Among them, data lakes have emerged as large heterogeneous data repositories that can be analyzed by various methods. An efficient data lake requires a metadata system that addresses the many problems arising when dealing with big data. In consequence, the study of data lake metadata models is currently an active research topic and many proposals have been made in this regard. However, existing metadata models are either tailored for a specific use case or insufficiently generic to manage different types of data lakes, including our previous model MEDAL. In this paper, we generalize MEDAL's concepts in a new metadata model called goldMEDAL. Moreover, we compare goldMEDAL with the most recent state-of-the-art metadata models aiming at genericity and show that we can reproduce these metadata models with goldMEDAL's concepts. As a proof of concept, we also illustrate that goldMEDAL allows the design of various data lakes by presenting three different use cases.

READ FULL TEXT
research
09/20/2019

Metadata Systems for Data Lakes: Models and Features

Over the past decade, the data lake concept has emerged as an alternativ...
research
07/11/2018

Modeling Data Lake Metadata with a Data Vault

With the rise of big data, business intelligence had to find solutions f...
research
05/10/2019

Metadata Management for Textual Documents in Data Lakes

Data lakes have emerged as an alternative to data warehouses for the sto...
research
07/23/2021

ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics

With new emerging technologies, such as satellites and drones, archaeolo...
research
10/09/2020

Paying down metadata debt: learning the representation of concepts using topic models

We introduce a data management problem called metadata debt, to identify...
research
09/03/2021

Joint Management and Analysis of Textual Documents and Tabular Data within the AUDAL Data Lake

In 2010, the concept of data lake emerged as an alternative to data ware...
research
03/01/2019

Automatic Techniques to Systematically Discover New Heap Exploitation Primitives

Heap exploitation techniques to abuse the metadata of allocators have be...

Please sign up or login with your details

Forgot password? Click here to reset