Comprehensive and Comprehensible Data Catalogs: The What, Who, Where, When, Why, and How of Metadata Management

03/12/2021
by   Pranav Subramaniam, et al.
0

Scalable data science requires access to metadata, which is increasingly managed by databases called data catalogs. With today's data catalogs, users choose between designs that make it easy to store or retrieve metadata, but not both. We find this problem arises because catalogs lack an easy to understand mental model. In this paper, we present a new catalog mental model called 5W1H+R. The new mental model is comprehensive in the metadata it represents, and comprehensible in that it permits users to locate metadata easily. We demonstrate these properties via a user study. We then study different schema designs for the new mental model implementation and evaluate them on different backends to understand their relative merits. We conclude mental models are important to make data catalogs more useful and to boost metadata management efforts that are crucial for data science tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2019

Metadata Systems for Data Lakes: Models and Features

Over the past decade, the data lake concept has emerged as an alternativ...
research
05/10/2019

Metadata Management for Textual Documents in Data Lakes

Data lakes have emerged as an alternative to data warehouses for the sto...
research
07/05/2021

Data Lake Ingestion Management

Data Lake (DL) is a Big Data analysis solution which ingests raw data in...
research
07/08/2022

Strong Anonymity for Mesh Messaging

Messaging systems built on mesh networks consisting of smartphones commu...
research
01/05/2021

Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL

Oak Ridge National Laboratory (ORNL) experimental neutron science facili...
research
12/18/2015

Quadripolar Relational Model: a framework for the description of borderline and narcissistic personality disorders

Borderline personality disorder and narcissistic personality disorder ar...
research
04/17/2018

Prioritizing and Scheduling Conferences for Metadata Harvesting in dblp

Maintaining literature databases and online bibliographies is a core res...

Please sign up or login with your details

Forgot password? Click here to reset