Discovering outstanding subgroup lists for numeric targets using MDL

06/16/2020
by   Hugo Manuel Proença, et al.
0

The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperparameters. We propose a dispersion-aware problem formulation for subgroup set discovery that is based on the minimum description length (MDL) principle and subgroup lists. We argue that the best subgroup list is the one that best summarizes the data given the overall distribution of the target. We restrict our focus to a single numeric target variable and show that our formalization coincides with an existing quality measure when finding a single subgroup, but that-in addition-it allows to trade off subgroup quality with the complexity of the subgroup. We next propose SSD++, a heuristic algorithm for which we empirically demonstrate that it returns outstanding subgroup lists: non-redundant sets of compact subgroups that stand out by having strongly deviating means and small spread.

READ FULL TEXT

page 28

page 29

research
03/25/2021

Robust subgroup discovery

We introduce the problem of robust subgroup discovery, i.e., finding a s...
research
05/01/2019

Interpretable multiclass classification by MDL-based rule lists

Interpretable classifiers have recently witnessed an increase in attenti...
research
06/13/2016

A framework for redescription set construction

Redescription mining is a field of knowledge discovery that aims at find...
research
08/19/2022

Merging Sorted Lists of Similar Strings

Merging T sorted, non-redundant lists containing M elements into a singl...
research
04/24/2022

Computing the Collection of Good Models for Rule Lists

Since the seminal paper by Breiman in 2001, who pointed out a potential ...
research
05/29/2018

In the IP of the Beholder: Strategies for Active IPv6 Topology Discovery

Existing methods for active topology discovery within the IPv6 Internet ...
research
09/22/2017

Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

Subgroup discovery is a local pattern mining technique to find interpret...

Please sign up or login with your details

Forgot password? Click here to reset