Performance Evaluation of Regression Models in Predicting the Cost of Medical Insurance

04/25/2023
by   Jonelle Angelo S. Cenita, et al.
0

The study aimed to evaluate the regression models' performance in predicting the cost of medical insurance. The Three (3) Regression Models in Machine Learning namely Linear Regression, Gradient Boosting, and Support Vector Machine were used. The performance will be evaluated using the metrics RMSE (Root Mean Square), r2 (R Square), and K-Fold Cross-validation. The study also sought to pinpoint the feature that would be most important in predicting the cost of medical insurance.The study is anchored on the knowledge discovery in databases (KDD) process. (KDD) process refers to the overall process of discovering useful knowledge from data. It show the performance evaluation results reveal that among the three (3) Regression models, Gradient boosting received the highest r2 (R Square) 0.892 and the lowest RMSE (Root Mean Square) 1336.594. Furthermore, the 10-Fold Cross-validation weighted mean findings are not significantly different from the r2 (R Square) results of the three (3) regression models. In addition, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics observed that in the charges and smoker features the median of one group lies outside of the box of the other group, so there is a difference between the two groups. It concludes that Gradient boosting appears to perform better among the three (3) regression models. K-Fold Cross-Validation concluded that the three (3) regression models are good. Moreover, Exploratory Data Analysis (EDA) using a box plot of descriptive statistics ceases that the highest charges are due to the smoker feature.

READ FULL TEXT

page 5

page 11

research
03/21/2023

Machine Learning Techniques for Estimating Soil Moisture from Mobile Captured Images

Precise Soil Moisture (SM) assessment is essential in agriculture. By un...
research
02/27/2020

Prediction of adverse events in Afghanistan: regression analysis of time series data grouped not by geographic dependencies

The aim of this study was to approach a difficult regression task on hig...
research
12/22/2021

Perceptual Evaluation of 360 Audiovisual Quality and Machine Learning Predictions

In an earlier study, we gathered perceptual evaluations of the audio, vi...
research
05/28/2019

The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial

In this tutorial paper, we first define mean squared error, variance, co...
research
02/13/2022

Scheduling Techniques for Liver Segmentation: ReduceLRonPlateau Vs OneCycleLR

Machine learning and computer vision techniques have influenced many fie...

Please sign up or login with your details

Forgot password? Click here to reset