Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making

06/30/2023
by   Isha Thombre, et al.
0

The human gut microbiota is known to contribute to numerous physiological functions of the body through their interplay with multiple organs and also implicated in a myriad of pathological conditions. Prolific research work in the past few decades have yielded valuable information regarding the relative taxonomic distribution of the gut microbiota that could enable personalized medicine. Unfortunately, the microbiome data suffers from class imbalance and high dimensionality issues that must be addressed. In this study, we have implemented data engineering algorithms to address the above-mentioned issues inherent to microbiome data. Four standard machine learning classifiers (logistic regression (LR), support vector machines (SVM), random forests (RF), and extreme gradient boosting (XGB) decision trees) were implemented on a previously published dataset of infants with cystic fibrosis exhibiting normal vs abnormal growth patterns. The issue of class imbalance and high dimensionality of the data was addressed through synthetic minority oversampling technique (SMOTE) and principal component analysis (PCA). Classification of host phenotype was performed at multiple levels of taxonomic hierarchy. Our results indicate that ensemble classifiers (RF and XGB decision trees) exhibit superior classification accuracy in predicting the host phenotype. The application of PCA significantly reduced the testing time while maintaining high classification accuracy. The highest classification accuracy was obtained at the levels of species for most classifiers. The prototype employed in the study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine.

READ FULL TEXT

page 13

page 14

research
04/16/2020

Nonparallel Hyperplane Classifiers for Multi-category Classification

Support vector machines (SVMs) are widely used for solving classificatio...
research
12/26/2020

Explainable Multi-class Classification of Medical Data

Machine Learning applications have brought new insights into a secondary...
research
05/08/2022

Ensemble Classifier Design Tuned to Dataset Characteristics for Network Intrusion Detection

Machine Learning-based supervised approaches require highly customized a...
research
04/20/2022

Condition Monitoring of Transformer Bushings Using Computational Intelligence

Dissolved Gas-in-oil analysis (DGA) is used to monitor the condition of ...
research
09/04/2023

Classic algorithms are fair learners: Classification Analysis of natural weather and wildfire occurrences

Classic machine learning algorithms have been reviewed and studied mathe...
research
05/28/2019

Integrated Neural Network and Machine Vision Approach For Leather Defect Classification

Leather is a type of natural, durable, flexible, soft, supple and pliabl...

Please sign up or login with your details

Forgot password? Click here to reset