Tell Me What I Need to Know: Succinctly Summarizing Data with Itemsets

04/25/2019
by   Michael Mampaey, et al.
0

Data analysis is an inherently iterative process. That is, what we know about the data greatly determines our expectations, and hence, what result we would find the most interesting. With this in mind, we introduce a well-founded approach for succinctly summarizing data with a collection of itemsets; using a probabilistic maximum entropy model, we iteratively find the most interesting itemset, and in turn update our model of the data accordingly. As we only include itemsets that are surprising with regard to the current model, the summary is guaranteed to be both descriptive and non-redundant. The algorithm that we present can either mine the top-k most interesting itemsets, or use the Bayesian Information Criterion to automatically identify the model containing only the itemsets most important for describing the data. Or, in other words, it will `tell you what you need to know'. Experiments on synthetic and benchmark data show that the discovered summaries are succinct, and correctly identify the key patterns in the data. The models they form attain high likelihoods, and inspection shows that they summarize the data well with increasingly specific, yet non-redundant itemsets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2019

Summarizing Data Succinctly with the Most Informative Itemsets

Knowledge discovery from data is an inherently iterative process. That i...
research
12/08/2019

The Probabilistic Backbone of Data-Driven Complex Networks: An example in Climate

Correlation Networks (CNs) inherently suffer from redundant information ...
research
03/18/2020

Neutron reflectometry analysis: using model-dependent methods

Neutron reflectometry analysis is an inherently ill-posed, which is to s...
research
10/23/2017

Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach

Visual exploration of high-dimensional real-valued datasets is a fundame...
research
01/30/2019

Software solutions for form-based collection of data and the semantic enrichment of form data

Data collection is an important part of many citizen science projects as...
research
08/18/2023

A Lightweight Transformer for Faster and Robust EBSD Data Collection

Three dimensional electron back-scattered diffraction (EBSD) microscopy ...
research
12/26/2018

Informative Object Annotations: Tell Me Something I Don't Know

Capturing the interesting components of an image is a key aspect of imag...

Please sign up or login with your details

Forgot password? Click here to reset