Learning Interesting Categorical Attributes for Refined Data Exploration

11/29/2017
by   Koninika Pal, et al.
0

This work proposes and evaluates a novel approach to determine interesting categorical attributes for lists of entities. Once identified, such categories are of immense value to allow constraining (filtering) a current view of a user to subsets of entities. We show how a classifier is trained that is able to tell whether or not a categorical attribute can act as a constraint, in the sense of human-perceived interestingness. The training data is harnessed from Web tables, treating the presence or absence of a table as an indication that the attribute used as a filter constraint is reasonable or not. For learning the classification model, we review four well-known statistical measures (features) for categorical attributes---entropy, unalikeability, peculiarity, and coverage. We additionally propose three new statistical measures to capture the distribution of data, tailored to our main objective. The learned model is evaluated by relevance assessments obtained through a user study, reflecting the applicability of the approach as a whole and, further, demonstrates the superiority of the proposed diversity measures over existing statistical measures like information entropy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2023

Visualizing departures from marginal homogeneity for square contingency tables with ordered categories

Square contingency tables are a special case commonly used in various fi...
research
10/15/2018

Assessing and Remedying Coverage for a Given Dataset

Data analysis impacts virtually every aspect of our society today. Often...
research
08/24/2017

GALILEO: A Generalized Low-Entropy Mixture Model

We present a new method of generating mixture models for data with categ...
research
05/21/2019

Similarity Measure Development for Case-Based Reasoning- A Data-driven Approach

In this paper, we demonstrate a data-driven methodology for modelling th...
research
12/20/2018

Relevant Attributes in Formal Contexts

Computing conceptual structures, like formal concept lattices, is in the...
research
07/21/2020

Unsupervised Heterogeneous Coupling Learning for Categorical Representation

Complex categorical data is often hierarchically coupled with heterogene...
research
11/09/2019

Estimation of entropy measures for categorical variables with spatial correlation

Entropy is a measure of heterogeneity widely used in applied sciences, o...

Please sign up or login with your details

Forgot password? Click here to reset