Comparison of 14 different families of classification algorithms on 115 binary datasets

06/02/2016
by   Jacques Wainer, et al.
0

We tested 14 very different classification algorithms (random forest, gradient boosting machines, SVM - linear, polynomial, and RBF - 1-hidden-layer neural nets, extreme learning machines, k-nearest neighbors and a bagging of knn, naive Bayes, learning vector quantization, elastic net logistic regression, sparse linear discriminant analysis, and a boosting of linear classifiers) on 115 real life binary datasets. We followed the Demsar analysis and found that the three best classifiers (random forest, gbm and RBF SVM) are not significantly different from each other. We also discuss that a change of less then 0.0112 in the error rate should be considered as an irrelevant change, and used a Bayesian ANOVA analysis to conclude that with high probability the differences between these three classifiers is not of practical consequence. We also verified the execution time of "standard implementations" of these algorithms and concluded that RBF SVM is the fastest (significantly so) both in training time and in training plus testing time.

READ FULL TEXT
research
12/21/2018

Ecological Data Analysis Based on Machine Learning Algorithms

Classification is an important supervised machine learning method, which...
research
10/27/2022

Supervised classification methods applied to airborne hyperspectral images: Comparative study using mutual information

Nowadays, the hyperspectral remote sensing imagery HSI becomes an import...
research
08/05/2022

A Computational Exploration of Emerging Methods of Variable Importance Estimation

Estimating the importance of variables is an essential task in modern ma...
research
02/12/2022

The Impact of Using Regression Models to Build Defect Classifiers

It is common practice to discretize continuous defect counts into defect...
research
06/13/2023

Automating Microservices Test Failure Analysis using Kubernetes Cluster Logs

Kubernetes is a free, open-source container orchestration system for dep...
research
04/17/2018

A Comparison of Machine Learning Algorithms for the Surveillance of Autism Spectrum Disorder

The Centers for Disease Control and Prevention (CDC) coordinates a labor...
research
02/07/2011

An Introduction to Artificial Prediction Markets for Classification

Prediction markets are used in real life to predict outcomes of interest...

Please sign up or login with your details

Forgot password? Click here to reset