Unsupervised Feature Selection to Identify Important ICD-10 Codes for Machine Learning: A Case Study on a Coronary Artery Disease Patient Cohort

03/25/2023
by   Peyman Ghasemi, et al.
0

The use of International Classification of Diseases (ICD) codes in healthcare presents a challenge in selecting relevant codes as features for machine learning models due to this system's large number of codes. In this study, we compared several unsupervised feature selection methods for an ICD code database of 49,075 coronary artery disease patients in Alberta, Canada. Specifically, we employed Laplacian Score, Unsupervised Feature Selection for Multi-Cluster Data, Autoencoder Inspired Unsupervised Feature Selection, Principal Feature Analysis, and Concrete Autoencoders with and without ICD tree weight adjustment to select the 100 best features from over 9,000 codes. We assessed the selected features based on their ability to reconstruct the initial feature space and predict 90-day mortality following discharge. Our findings revealed that the Concrete Autoencoder methods outperformed all other methods in both tasks. Furthermore, the weight adjustment in the Concrete Autoencoder method decreased the complexity of features.

READ FULL TEXT

page 3

page 6

research
01/27/2019

Concrete Autoencoders for Differentiable Feature Selection and Reconstruction

We introduce the concrete autoencoder, an end-to-end differentiable meth...
research
10/11/2021

Deep Unsupervised Feature Selection by Discarding Nuisance and Correlated Features

Modern datasets often contain large subsets of correlated features and n...
research
08/22/2020

Seasonal-adjustment Based Feature Selection Method for Large-scale Search Engine Logs

Search engine logs have a great potential in tracking and predicting out...
research
07/02/2021

Few-shot Learning for Unsupervised Feature Selection

We propose a few-shot learning method for unsupervised feature selection...
research
07/05/2022

Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery

Healthcare datasets present many challenges to both machine learning and...

Please sign up or login with your details

Forgot password? Click here to reset