Is rotation forest the best classifier for problems with continuous features?

09/18/2018
by   A. Bagnall, et al.
0

Rotation forest is a tree based ensemble that performs transforms on subsets of attributes prior to constructing each tree. We present an empirical comparison of classifiers for problems with only real valued features. We evaluate classifiers from three families of algorithms: support vector machines; tree-based ensembles; and neural networks. We compare classifiers on unseen data based on the quality of the decision rule (using classification error) the ability to rank cases (area under the receiver operator curve) and the probability estimates (using negative log likelihood). We conclude that, in answer to the question posed in the title, yes, rotation forest, is significantly more accurate on average than competing techniques when compared on three distinct sets of datasets. The same pattern of results are observed when tuning classifiers on the train data using a grid search. We investigate why rotation forest does so well by testing whether the characteristics of the data can be used to differentiate classifier performance. We assess the impact of the design features of rotation forest through an ablative study that transforms random forest into rotation forest. We identify the major limitation of rotation forest as its scalability, particularly in number of attributes. To overcome this problem we develop a model to predict the train time of the algorithm and hence propose a contract version of rotation forest where a run time cap a priori. We demonstrate that on large problems rotation forest can be made an order of magnitude faster without significant loss of accuracy and that there is no real benefit (on average) from tuning the ensemble. We conclude that without any domain knowledge to indicate an algorithm preference, rotation forest should be the default algorithm of choice for problems with continuous attributes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2017

The Heterogeneous Ensembles of Standard Classification Algorithms (HESCA): the Whole is Greater than the Sum of its Parts

Building classification models is an intrinsically practical exercise th...
research
12/17/2017

A MapReduce-based rotation forest classifier for epileptic seizure prediction

In this era, big data applications including biomedical are becoming att...
research
03/20/2017

On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms

We demonstrate that, for a range of state-of-the-art machine learning al...
research
05/02/2023

Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression

Time Series Extrinsic Regression (TSER) involves using a set of training...
research
01/28/2022

The FreshPRINCE: A Simple Transformation Based Pipeline Time Series Classifier

There have recently been significant advances in the accuracy of algorit...
research
01/31/2018

The Impact of Automated Parameter Optimization on Defect Prediction Models

Defect prediction models---classifiers that identify defect-prone softwa...
research
06/10/2021

What Does Rotation Prediction Tell Us about Classifier Accuracy under Varying Testing Environments?

Understanding classifier decision under novel environments is central to...

Please sign up or login with your details

Forgot password? Click here to reset