Machine Learning Based Missing Values Imputation in Categorical Datasets

06/10/2023
by   Muhammad Ishaq, et al.
0

This study explored the use of machine learning algorithms for predicting and imputing missing values in categorical datasets. We focused on ensemble models that use the error correction output codes (ECOC) framework, including SVM-based and KNN-based ensemble models, as well as an ensemble classifier that combines SVM, KNN, and MLP models. We applied these algorithms to three datasets: the CPU dataset, the hypothyroid dataset, and the Breast Cancer dataset. Our experiments showed that the machine learning algorithms were able to achieve good performance in predicting and imputing the missing values, with some variations depending on the specific dataset and missing value pattern. The ensemble models using the error correction output codes (ECOC) framework were particularly effective in improving the accuracy and robustness of the predictions, compared to individual models. However, there are also challenges and limitations to using deep learning for missing value imputation, including the need for large amounts of labeled data and the potential for overfitting. Further research is needed to evaluate the effectiveness and efficiency of deep learning algorithms for missing value imputation and to develop strategies for addressing the challenges and limitations that may arise.

READ FULL TEXT

page 15

page 19

page 20

page 23

page 25

research
11/16/2020

Imputation techniques on missing values in breast cancer treatment and fertility data

Clinical decision support using data mining techniques offers more intel...
research
02/28/2022

Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework

Missing values in tabular data restrict the use and performance of machi...
research
10/31/2022

Diffusion models for missing value imputation in tabular data

Missing value imputation in machine learning is the task of estimating t...
research
06/28/2022

No imputation without representation

By filling in missing values in datasets, imputation allows these datase...
research
01/02/2023

Chains of Autoreplicative Random Forests for missing value imputation in high-dimensional datasets

Missing values are a common problem in data science and machine learning...
research
09/20/2018

Challenges for Toxic Comment Classification: An In-Depth Error Analysis

Toxic comment classification has become an active research field with ma...

Please sign up or login with your details

Forgot password? Click here to reset