The Heterogeneous Ensembles of Standard Classification Algorithms (HESCA): the Whole is Greater than the Sum of its Parts

10/25/2017
by   James Large, et al.
0

Building classification models is an intrinsically practical exercise that requires many design decisions prior to deployment. We aim to provide some guidance in this decision making process. Specifically, given a classification problem with real valued attributes, we consider which classifier or family of classifiers should one use. Strong contenders are tree based homogeneous ensembles, support vector machines or deep neural networks. All three families of model could claim to be state-of-the-art, and yet it is not clear when one is preferable to the others. Our extensive experiments with over 200 data sets from two distinct archives demonstrate that, rather than choose a single family and expend computing resources on optimising that model, it is significantly better to build simpler versions of classifiers from each family and ensemble. We show that the Heterogeneous Ensembles of Standard Classification Algorithms (HESCA), which ensembles based on error estimates formed on the train data, is significantly better (in terms of error, balanced error, negative log likelihood and area under the ROC curve) than its individual components, picking the component that is best on train data, and a support vector machine tuned over 1089 different parameter configurations. We demonstrate HESCA+, which contains a deep neural network, a support vector machine and two decision tree forests, is significantly better than its components, picking the best component, and HESCA. We analyse the results further and find that HESCA and HESCA+ are of particular value when the train set size is relatively small and the problem has multiple classes. HESCA is a fast approach that is, on average, as good as state-of-the-art classifiers, whereas HESCA+ is significantly better than average and represents a strong benchmark for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2018

Is rotation forest the best classifier for problems with continuous features?

Rotation forest is a tree based ensemble that performs transforms on sub...
research
02/12/2019

Learning Theory and Support Vector Machines - a primer

The main goal of statistical learning theory is to provide a fundamental...
research
03/28/2017

Simulated Data Experiments for Time Series Classification Part 1: Accuracy Comparison with Default Settings

There are now a broad range of time series classification (TSC) algorith...
research
08/20/2018

Faster Support Vector Machines

The time complexity of support vector machines (SVMs) prohibits training...
research
02/21/2018

Pooling homogeneous ensembles to build heterogeneous ensembles

In ensemble methods, the outputs of a collection of diverse classifiers ...
research
03/30/2018

Learning to generate classifiers

We train a network to generate mappings between training sets and classi...
research
02/21/2018

Determining the best classifier for predicting the value of a boolean field on a blood donor database

Motivation: Thanks to digitization, we often have access to large databa...

Please sign up or login with your details

Forgot password? Click here to reset