Metadata Management for Textual Documents in Data Lakes

05/10/2019
by   Pegdwendé Sawadogo, et al.
0

Data lakes have emerged as an alternative to data warehouses for the storage, exploration and analysis of big data. In a data lake, data are stored in a raw state and bear no explicit schema. Thence, an efficient metadata system is essential to avoid the data lake turning to a so-called data swamp. Existing works about managing data lake metadata mostly focus on structured and semi-structured data, with little research on unstructured data. Thus, we propose in this paper a methodological approach to build and manage a metadata system that is specific to textual documents in data lakes. First, we make an inventory of usual and meaningful metadata to extract. Then, we apply some specific techniques from the text mining and information retrieval domains to extract, store and reuse these metadata within the COREL research project, in order to validate our proposals.

READ FULL TEXT
research
09/03/2021

Joint Management and Analysis of Textual Documents and Tabular Data within the AUDAL Data Lake

In 2010, the concept of data lake emerged as an alternative to data ware...
research
09/20/2019

Metadata Systems for Data Lakes: Models and Features

Over the past decade, the data lake concept has emerged as an alternativ...
research
07/23/2021

ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics

With new emerging technologies, such as satellites and drones, archaeolo...
research
04/06/2017

Contextual Data Collection for Smart Cities

As part of Smart Cities initiatives, national, regional and local govern...
research
03/12/2021

Comprehensive and Comprehensible Data Catalogs: The What, Who, Where, When, Why, and How of Metadata Management

Scalable data science requires access to metadata, which is increasingly...
research
05/31/2019

DFS: A Dataset File System for Data Discovering Users

Many research questions can be answered quickly and efficiently using da...
research
03/24/2021

Coining goldMEDAL: A New Contribution to Data Lake Generic Metadata Modeling

The rise of big data has revolutionized data exploitation practices and ...

Please sign up or login with your details

Forgot password? Click here to reset