An empirical comparison of machine learning models for student's mental health illness assessment
Student's mental health problems have been explored previously in higher education literature in various contexts including empirical work involving quantitative and qualitative methods. Nevertheless, comparatively few research could be found, aiming for computational methods that learn information directly from data without relying on set parameters for a predetermined equation as an analytical method. This study aims to investigate the performance of Machine learning (ML) models used in higher education. ML models considered are Naive Bayes, Support Vector Machine, K-Nearest Neighbor, Logistic regression, Stochastic Gradient Descent, Decision Tree, Random Forest, XGBoost (Extreme Gradient Boosting Decision Tree), and NGBoost (Natural) algorithm. Considering the factors of mental health illness among students, we follow three phases of data processing: segmentation, feature extraction, and classification. We evaluate these ML models against classification performance metrics such as accuracy, precision, recall, F1 score, and predicted run time. The empirical analysis includes two contributions: 1. It examines the performance of various ML models on a survey-based educational dataset, inferring a significant classification performance by a tree-based XGBoost algorithm; 2. It explores the feature importance [variables] from the datasets to infer the significant importance of social support, learning environment, and childhood adversities on a student's mental health illness.
READ FULL TEXT