A Plaque Test for Redundancies in Relational Data

06/05/2023
by   Christoph Köhnen, et al.
0

Inspired by the visualization of dental plaque at the dentist's office, this article proposes a novel visualization technique for identifying redundancies in relational data. Our approach builds upon an established information-theoretic framework that, despite being well-principled, remains unexplored in practical applications. In this framework, we calculate the information content (or entropy) of each cell in a relation instance, given a set of functional dependencies. The entropy value represents the likelihood of inferring the cell's value based on the dependencies and the remaining tuples. By highlighting cells with lower entropy, we effectively visualize redundancies in the data. We present an initial prototype implementation and demonstrate that a straightforward approach is insufficient for handling practical problem sizes. To address this limitation, we propose several optimizations, which we prove to be correct. Additionally, we present a Monte Carlo approximation technique with a known error, enabling computationally tractable computations. Using a real-world dataset of modest size, we illustrate the potential of our visualization technique. Our vision is to support domain experts with data profiling and data cleaning tasks, akin to the functionality of a plaque test at the dentist's.

READ FULL TEXT
research
09/08/2019

Auto-completion for Data Cells in Relational Tables

We address the task of auto-completing data cells in relational tables. ...
research
08/05/2019

Elements of Generalized Tsallis Relative Entropy in Classical Information Theory

In this article, we propose a modification in generalised Tsallis entrop...
research
02/06/2019

Finding the Transitive Closure of Functional Dependencies using Strategic Port Graph Rewriting

We present a new approach to the logical design of relational databases,...
research
08/02/2022

Principles of Query Visualization

Query Visualization (QV) is the problem of transforming a given query in...
research
07/08/2014

Visualization and Correction of Automated Segmentation, Tracking and Lineaging from 5-D Stem Cell Image Sequences

Results: We present an application that enables the quantitative analysi...
research
07/24/2023

Visual Analytics for Understanding Draco's Knowledge Base

Draco has been developed as an automated visualization recommendation sy...
research
09/21/2022

Explaining Anomalies using Denoising Autoencoders for Financial Tabular Data

Recent advances in Explainable AI (XAI) increased the demand for deploym...

Please sign up or login with your details

Forgot password? Click here to reset