Explainable Global Error Weighted on Feature Importance: The xGEWFI metric to evaluate the error of data imputation and data augmentation

Evaluating the performance of an algorithm is crucial. Evaluating the performance of data imputation and data augmentation can be similar since both generated data can be compared with an original distribution. Although, the typical evaluation metrics have the same flaw: They calculate the feature's error and the global error on the generated data without weighting the error with the feature importance. The result can be good if all of the feature's importance is similar. However, in most cases, the importance of the features is imbalanced, and it can induce an important bias on the features and global errors. This paper proposes a novel metric named "Explainable Global Error Weighted on Feature Importance"(xGEWFI). This new metric is tested in a whole preprocessing method that 1. detects the outliers and replaces them with a null value. 2. imputes the data missing, and 3. augments the data. At the end of the process, the xGEWFI error is calculated. The distribution error between the original and generated data is calculated using a Kolmogorov-Smirnov test (KS test) for each feature. Those results are multiplied by the importance of the respective features, calculated using a Random Forest (RF) algorithm. The metric result is expressed in an explainable format, aiming for an ethical AI.

READ FULL TEXT

page 5

page 6

research
01/19/2017

Random Forest Missing Data Algorithms

Random forest (RF) missing data algorithms are an attractive approach fo...
research
10/12/2020

Class-Weighted Evaluation Metrics for Imbalanced Data Classification

Class distribution skews in imbalanced datasets may lead to models with ...
research
09/26/2019

Explainable Deep Learning for Augmentation of sRNA Expression Profiles

The lack of well-structured metadata annotations complicates there-usabi...
research
05/01/2023

Interpreting Deep Forest through Feature Contribution and MDI Feature Importance

Deep forest is a non-differentiable deep model which has achieved impres...
research
06/14/2023

Data Augmentation for Seizure Prediction with Generative Diffusion Model

Objective: Seizure prediction is of great importance to improve the life...
research
03/26/2020

From unbiased MDI Feature Importance to Explainable AI for Trees

We attempt to give a unifying view of the various recent attempts to (i)...
research
05/20/2020

Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels

The way infants use auditory cues to learn to speak despite the acoustic...

Please sign up or login with your details

Forgot password? Click here to reset