Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques

by   Marmar Orooji, et al.

This study is motivated by the magnitude of the problem of Louisiana high school dropout and its negative impacts on individual and public well-being. Our goal is to predict students who are at risk of high school dropout, by examining Louisiana administrative dataset. Due to the imbalanced nature of the dataset, imbalanced learning techniques including resampling, case weighting, and cost-sensitive learning have been applied to enhance the prediction performance on the rare class. Performance metrics used in this study are F-measure, recall and precision of the rare class. We compare the performance of several machine learning algorithms such as neural networks, decision trees and bagging trees in combination with the imbalanced learning approaches using an administrative dataset of size of 366k+ from Louisiana Department of Education. Experiments show that application of imbalanced learning methods produces good results on recall but decreases precision, whereas base classifiers without regard of imbalanced data handling gives better precision but poor recall. Overall application of imbalanced learning techniques is beneficial, yet more studies are desired to improve precision.



There are no comments yet.


page 1

page 2

page 3

page 4


Hellinger Distance Trees for Imbalanced Streams

Classifiers trained on data sets possessing an imbalanced class distribu...

Precision-Recall Curve (PRC) Classification Trees

The classification of imbalanced data has presented a significant challe...

Protein Classification using Machine Learning and Statistical Techniques: A Comparative Analysis

In recent era prediction of enzyme class from an unknown protein is one ...

Limitations of ROC on Imbalanced Data: Evaluation of LVAD Mortality Risk Scores

Objective: This study illustrates the ambiguity of ROC in evaluating two...

Superensemble Classifier for Improving Predictions in Imbalanced Datasets

Learning from an imbalanced dataset is a tricky proposition. Because the...

Predicting Electricity Outages Caused by Convective Storms

We consider the problem of predicting power outages in an electrical pow...

Detection of extragalactic Ultra-Compact Dwarfs and Globular Clusters using Explainable AI techniques

Compact stellar systems such as Ultra-compact dwarfs (UCDs) and Globular...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.