On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

02/28/2019
by   Mikel Elkano, et al.
0

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.

READ FULL TEXT

page 1

page 4

research
02/25/2019

CFM-BD: a distributed rule induction algorithm for building Compact Fuzzy Models in Big Data classification problems

Interpretability has always been a major concern for fuzzy rule-based cl...
research
08/01/2019

Optimize TSK Fuzzy Systems for Big Data Classification Problems: Bag of Tricks

Takagi-Sugeno-Kang (TSK) fuzzy systems are flexible and interpretable ma...
research
03/27/2023

Lifting uniform learners via distributional decomposition

We show how any PAC learning algorithm that works under the uniform dist...
research
07/27/2020

Continuous Fuzzy Transform as Integral Operator

The Fuzzy transform is ubiquitous in different research fields and appli...
research
04/14/2015

HHCART: An Oblique Decision Tree

Decision trees are a popular technique in statistical data classificatio...
research
05/04/2019

Optimal Resampling for Learning Small Models

Models often need to be constrained to a certain size for them to be con...

Please sign up or login with your details

Forgot password? Click here to reset