An Optimal k Nearest Neighbours Ensemble for Classification Based on Extended Neighbourhood Rule with Features subspace

by   Amjad Ali, et al.

To minimize the effect of outliers, kNN ensembles identify a set of closest observations to a new sample point to estimate its unknown class by using majority voting in the labels of the training instances in the neighbourhood. Ordinary kNN based procedures determine k closest training observations in the neighbourhood region (enclosed by a sphere) by using a distance formula. The k nearest neighbours procedure may not work in a situation where sample points in the test data follow the pattern of the nearest observations that lie on a certain path not contained in the given sphere of nearest neighbours. Furthermore, these methods combine hundreds of base kNN learners and many of them might have high classification errors thereby resulting in poor ensembles. To overcome these problems, an optimal extended neighbourhood rule based ensemble is proposed where the neighbours are determined in k steps. It starts from the first nearest sample point to the unseen observation. The second nearest data point is identified that is closest to the previously selected data point. This process is continued until the required number of the k observations are obtained. Each base model in the ensemble is constructed on a bootstrap sample in conjunction with a random subset of features. After building a sufficiently large number of base models, the optimal models are then selected based on their performance on out-of-bag (OOB) data.


page 1

page 6

page 7

page 11


A k nearest neighbours classifiers ensemble based on extended neighbourhood rule and features subsets

kNN based ensemble methods minimise the effect of outliers by identifyin...

A Random Projection k Nearest Neighbours Ensemble for Classification via Extended Neighbourhood Rule

Ensembles based on k nearest neighbours (kNN) combine a large number of ...

Ensembles of Random Sphere Cover Classifiers

We propose and evaluate alternative ensemble schemes for a new instance ...

Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods

Ensemble methods for supervised machine learning have become popular due...

Optimal trees selection for classification via out-of-bag assessment and sub-bagging

The effect of training data size on machine learning methods has been we...

To Bag is to Prune

It is notoriously hard to build a bad Random Forest (RF). Concurrently, ...

RIPML: A Restricted Isometry Property based Approach to Multilabel Learning

The multilabel learning problem with large number of labels, features, a...

Please sign up or login with your details

Forgot password? Click here to reset