Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques

10/29/2019
by   Marmar Orooji, et al.
0

This study is motivated by the magnitude of the problem of Louisiana high school dropout and its negative impacts on individual and public well-being. Our goal is to predict students who are at risk of high school dropout, by examining Louisiana administrative dataset. Due to the imbalanced nature of the dataset, imbalanced learning techniques including resampling, case weighting, and cost-sensitive learning have been applied to enhance the prediction performance on the rare class. Performance metrics used in this study are F-measure, recall and precision of the rare class. We compare the performance of several machine learning algorithms such as neural networks, decision trees and bagging trees in combination with the imbalanced learning approaches using an administrative dataset of size of 366k+ from Louisiana Department of Education. Experiments show that application of imbalanced learning methods produces good results on recall but decreases precision, whereas base classifiers without regard of imbalanced data handling gives better precision but poor recall. Overall application of imbalanced learning techniques is beneficial, yet more studies are desired to improve precision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2014

Hellinger Distance Trees for Imbalanced Streams

Classifiers trained on data sets possessing an imbalanced class distribu...
research
11/14/2022

Machine Learning Performance Analysis to Predict Stroke Based on Imbalanced Medical Dataset

Cerebral stroke, the second most substantial cause of death universally,...
research
07/06/2022

A Hybrid Approach for Binary Classification of Imbalanced Data

Binary classification with an imbalanced dataset is challenging. Models ...
research
11/15/2020

Precision-Recall Curve (PRC) Classification Trees

The classification of imbalanced data has presented a significant challe...
research
01/18/2019

Protein Classification using Machine Learning and Statistical Techniques: A Comparative Analysis

In recent era prediction of enzyme class from an unknown protein is one ...
research
04/12/2022

Prediction of motor insurance claims occurrence as an imbalanced machine learning problem

The insurance industry, with its large datasets, is a natural place to u...
research
05/20/2022

Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark

The paper introduces a new dataset to assess the performance of machine ...

Please sign up or login with your details

Forgot password? Click here to reset