A new algorithm for Subgroup Set Discovery based on Information Gain

07/26/2023
by   Daniel Gómez-Bravo, et al.
0

Pattern discovery is a machine learning technique that aims to find sets of items, subsequences, or substructures that are present in a dataset with a higher frequency value than a manually set threshold. This process helps to identify recurring patterns or relationships within the data, allowing for valuable insights and knowledge extraction. In this work, we propose Information Gained Subgroup Discovery (IGSD), a new SD algorithm for pattern discovery that combines Information Gain (IG) and Odds Ratio (OR) as a multi-criteria for pattern selection. The algorithm tries to tackle some limitations of state-of-the-art SD algorithms like the need for fine-tuning of key parameters for each dataset, usage of a single pattern search criteria set by hand, usage of non-overlapping data structures for subgroup space exploration, and the impossibility to search for patterns by fixing some relevant dataset variables. Thus, we compare the performance of IGSD with two state-of-the-art SD algorithms: FSSD and SSD++. Eleven datasets are assessed using these algorithms. For the performance evaluation, we also propose to complement standard SD measures with IG, OR, and p-value. Obtained results show that FSSD and SSD++ algorithms provide less reliable patterns and reduced sets of patterns than IGSD algorithm for all datasets considered. Additionally, IGSD provides better OR values than FSSD and SSD++, stating a higher dependence between patterns and targets. Moreover, patterns obtained for one of the datasets used, have been validated by a group of domain experts. Thus, patterns provided by IGSD show better agreement with experts than patterns obtained by FSSD and SSD++ algorithms. These results demonstrate the suitability of the IGSD as a method for pattern discovery and suggest that the inclusion of non-standard SD metrics allows to better evaluate discovered patterns.

READ FULL TEXT

page 22

page 25

page 28

research
10/23/2020

A Computational Evaluation of Musical Pattern Discovery Algorithms

Pattern discovery algorithms in the music domain aim to find meaningful ...
research
01/11/2021

The Semantic Adjacency Criterion in Time Intervals Mining

Frequent temporal patterns discovered in time-interval-based multivariat...
research
06/26/2015

Skopus: Exact discovery of the most interesting sequential patterns under Leverage

This paper presents a framework for exact discovery of the most interest...
research
08/28/2023

Interactive Multi Interest Process Pattern Discovery

Process pattern discovery methods (PPDMs) aim at identifying patterns of...
research
07/11/2021

Pattern Discovery and Validation Using Scientific Research Methods

Pattern discovery, the process of discovering previously unrecognized pa...
research
09/08/2022

Towards a Likelihood Ratio Approach for Bloodstain Pattern Analysis

In this work, we explore the application of likelihood ratio as a forens...
research
11/22/2019

Performance Effectiveness of Multimedia Information Search Using the Epsilon-Greedy Algorithm

In the search and retrieval of multimedia objects, it is impractical to ...

Please sign up or login with your details

Forgot password? Click here to reset