EvoSplit: An evolutionary approach to split a multi-label data set into disjoint subsets

This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the label (or label pair) distribution of the original data set into the different subsets. Following the same aim, this paper first introduces a single-objective evolutionary approach that tries to obtain a split that maximizes the similarity between those distributions independently. Second, a new multi-objective evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions (label and label pair). Both approaches are validated using well-known multi-label data sets as well as large image data sets currently used in computer vision and machine learning applications. EvoSplit improves the splitting of a data set in comparison to the iterative stratification following different measures: Label Distribution, Label Pair Distribution, Examples Distribution, folds and fold-label pairs with zero positive examples.

READ FULL TEXT
research
04/27/2017

A Network Perspective on Stratification of Multi-Label Data

In the recent years, we have witnessed the development of multi-label cl...
research
04/13/2020

MLPSVM:A new parallel support vector machine to multi-label learning

Multi-label learning has attracted the attention of the machine learning...
research
02/16/2022

Unified smoke and fire detection in an evolutionary framework with self-supervised progressive data augment

Few researches have studied simultaneous detection of smoke and flame ac...
research
06/28/2021

Explaining the Performance of Multi-label Classification Methods with Data Set Properties

Meta learning generalizes the empirical experience with different learni...
research
07/30/2018

Making Classifier Chains Resilient to Class Imbalance

Class imbalance is an intrinsic characteristic of multi-label data. Most...
research
05/15/2018

Distribution-based Label Space Transformation for Multi-label Learning

Multi-label learning problems have manifested themselves in various mach...
research
09/27/2021

Derivative Extrapolation Using Least Squares

Here, we present three methods for differentiating discrete sets from st...

Please sign up or login with your details

Forgot password? Click here to reset