Explainable Multi-class Classification of Medical Data

12/26/2020
by   YuanZheng Hu, et al.
14

Machine Learning applications have brought new insights into a secondary analysis of medical data. Machine Learning helps to develop new drugs, define populations susceptible to certain illnesses, identify predictors of many common diseases. At the same time, Machine Learning results depend on convolution of many factors, including feature selection, class (im)balance, algorithm preference, and performance metrics. In this paper, we present explainable multi-class classification of a large medical data set. We in details discuss knowledge-based feature engineering, data set balancing, best model selection, and parameter tuning. Six algorithms are used in this study: Support Vector Machine (SVM), Naïve Bayes, Gradient Boosting, Decision Trees, Random Forest, and Logistic Regression. Our empirical evaluation is done on the UCI Diabetes 130-US hospitals for years 1999-2008 dataset, with the task to classify patient hospital re-admission stay into three classes: 0 days, <30 days, or > 30 days. Our results show that using 23 medication features in learning experiments improves Recall of five out of the six applied learning algorithms. This is a new result that expands the previous studies conducted on the same data. Gradient Boosting and Random Forest outperformed other algorithms in terms of the three-class classification Accuracy.

READ FULL TEXT

page 6

page 18

page 19

page 20

page 21

research
05/27/2021

Explainable Multi-class Classification of the CAMH COVID-19 Mental Health Data

Application of Machine Learning algorithms to the medical domain is an e...
research
06/18/2021

Performance Evaluation of Classification Models for Household Income, Consumption and Expenditure Data Set

Food security is more prominent on the policy agenda today than it has b...
research
10/25/2021

Gradient-based Quadratic Multiform Separation

Classification as a supervised learning concept is an important content ...
research
08/31/2016

hi-RF: Incremental Learning Random Forest for large-scale multi-class Data Classification

In recent years, dynamically growing data and incrementally growing numb...
research
01/07/2010

An Empirical Evaluation of Four Algorithms for Multi-Class Classification: Mart, ABC-Mart, Robust LogitBoost, and ABC-LogitBoost

This empirical study is mainly devoted to comparing four tree-based boos...
research
06/30/2023

Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making

The human gut microbiota is known to contribute to numerous physiologica...
research
01/10/2020

The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring

Refactoring is the process of changing the internal structure of softwar...

Please sign up or login with your details

Forgot password? Click here to reset