Subjectively Interesting Subgroup Discovery on Real-valued Targets

10/12/2017
by   Jefrey Lijffijt, et al.
0

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2012

Real-valued All-Dimensions search: Low-overhead rapid searching over subsets of attributes

This paper is about searching the combinatorial space of contingency tab...
research
10/23/2017

Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach

Visual exploration of high-dimensional real-valued datasets is a fundame...
research
02/02/2019

Itemsets for Real-valued Datasets

Pattern mining is one of the most well-studied subfields in exploratory ...
research
07/09/2021

Redescription Model Mining

This paper introduces Redescription Model Mining, a novel approach to id...
research
10/28/2015

Flexibly Mining Better Subgroups

In subgroup discovery, also known as supervised pattern mining, discover...
research
12/29/2013

Probabilistic Archetypal Analysis

Archetypal analysis represents a set of observations as convex combinati...
research
11/07/2017

Grafting for Combinatorial Boolean Model using Frequent Itemset Mining

This paper introduces the combinatorial Boolean model (CBM), which is de...

Please sign up or login with your details

Forgot password? Click here to reset