Log In Sign Up

Extreme Learning Tree

by   Anton Akusok, et al.

The paper proposes a new variant of a decision tree, called an Extreme Learning Tree. It consists of an extremely random tree with non-linear data transformation, and a linear observer that provides predictions based on the leaf index where the data samples fall. The proposed method outperforms linear models on a benchmark dataset, and may be a building block for a future variant of Random Forest.


Modelling hetegeneous treatment effects by quantitle local polynomial decision tree and forest

To further develop the statistical inference problem for heterogeneous t...

Tree based classification of tabla strokes

The paper attempts to validate the effectiveness of tree classifiers to ...

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

We propose a novel algorithm for optimizing multivariate linear threshol...

Building Decision Forest via Deep Reinforcement Learning

Ensemble learning methods whose base classifier is a decision tree usual...

Are NBA players getting paid according to their performance on court?

It is customary for researchers and practitioners to fit linear models i...

1 Introduction

Randomized methods are a recent trend in practical machine learning 


. They enable the high performance of complex non-linear methods without the high computational cost of their optimization. Current most prominent examples are randomized neural networks, in both feed-forward 

[8] and recurrent [11] forms. For the latter, the randomized approach provided an efficient training method for the first time, and enabled achieving state-of-the-art performance in multiple areas [9].

Random forest [13] is one of the best methods for Big Data processing due to its adaptive nearest neighbour behavior [10]. The forest predicts an output based only on local data samples. Such an approach works the better the more training data is available, thus making for a perfect supervised method for Big Data. K-nearest neighbors algorithm benefits from more data as the data itself is the model, but Random Forest avoids the quadratic scaling of k-Nearest neighbors in terms of the data samples, that makes it prohibitively slow for large-scale problems.

Decision tree [1]

is a building block of Random Forest. A deep decision tree has high variance but low bias. An ensemble of multiple such trees reduces variance, and improves the prediction performance. Additional measures are taken to make the trees in an ensemble as different as possible, including random subsets of features and boosting 


The paper proposes a merge between random methods and a decision tree, called an Extreme Learning Tree (ELT). The method builds a tree using expanded data features from an Extreme Learning Machine [6], by splitting nodes on a random feature at a random point. The result is an Extremely Randomized Tree [5]

. Then a linear observer is added to the leaves of the tree, that learns a linear projection from the leaves to the target outputs. Each tree leaf is represented by its index, in the one-hot encoding format.

2 Methodology

Extreme Learning Tree consists of three parts. First, it generates random data features using an Extreme Learning Machine (ELM) [7]. Second, it builds a random tree from these features, similar to Extremely Randomized Trees [5]

. Each data sample is then represented by the index of its leaf from the tree, in one-hot encoding. Third, a linear regression is learned from the dataset in that one-hot encoding to the target outputs.

ELT follows the random methods paradigm as it has an untrained random part (the tree), and a learned linear observer (a linear regression model from leaves of the tree to the target outputs).

An ELT tree has two hyper parameters: the minimum node size, and the maximum thee depth. A node data is split by a random feature using a random split point. Split points that generates nodes under the minimum size are rejected. Nodes that reach the maximum depth or under twice the minimum size become leafs. Node splitting continues until there are non-leaf terminal nodes.

3 Experimental results

The Extreme Learning Tree is tested on well-known Iris flower dataset [3], in comparison with a Decision Tree, an L2 regularized ELM [12]

, and Ridge regression. Decision Tree implementation is from the Scikit-Learn library


The random tree in the ELT method splits data samples into groups of similar ones. The resulting structure in the original data space is shown on Figure 1. The tree works as a adaptive nearest neighbour, combining together similar samples. Then the target variable information from these samples is used by a linear observer to make predictions.

Figure 1: Leaf structure of an ELT, each color represents a different leaf. The random tree works as an approximated nearest neighbour method, joining together similar data samples.

A formal performance comparison is done on Iris dataset. The data is randomly split into 70% training and 30% test sets, and the test accuracy is calculated for all the methods. The whole experiment is repeated 100 times. Mean accuracy and its standard deviation are presented in Table 


Method Accuracy std, %
Ridge regression
Extreme Learning Tree
Decision Tree
Table 1: Average accuracy and its standard deviation on a test subset of Iris dataset.

In this experiment, an Extreme Learning Tree performs under ELM and Decision Tree methods. However, it outperforms a linear model (in the form of Ridge regression) by a significant margin. Outperforming a linear model is an achievement for a single ELT, as it represents each data sample by a single number – an index of its leaf in the tree.

Decision surface of ELT is visualized on Figure 3. The boundaries between classes have complex shape, but the classes are unbroken. Class boundaries of the original Decision Tree (shown on Figure 3) break into each other creating false predictions. They are always parallel to an axis, while ELT learns class boundaries of an arbitrary shape.

Figure 2: Decision surface of an ELT on Iris dataset, using different pairs of features. Different colors correspond to the three different classes of Iris flowers.
Figure 3: Decision surface of a Decision Tree on Iris dataset, using different pairs of features. Note that all decision boundaries are parallel to axes.
Figure 2: Decision surface of an ELT on Iris dataset, using different pairs of features. Different colors correspond to the three different classes of Iris flowers.

4 Conclusions

The paper proposes a new version of decision tree, that follows the random methods paradigm. It consists of an untrained random non-linear tree, and a learned linear observer. The method provides decision boundaries of a complex shape and with less noise than an original decision tree. It outperforms a purely linear model in accuracy despite representing the data samples only by a corresponding tree leaf index.

Future works will examine an application of Extreme Learning Tree to an ensemble method similar to Random Forest.


  • [1] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen (1984) Classification and regression trees. CRC press. External Links: ISBN 0-412-04841-8 Cited by: §1.
  • [2] L. Breiman (2001) Random Forests. Machine Learning 45 (1), pp. 5–32. External Links: ISSN 1573-0565, Document Cited by: §1.
  • [3] R. A. FISHER (1936) THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS. Annals of Eugenics 7 (2), pp. 179–188. External Links: ISSN 2050-1439, Document Cited by: §3.
  • [4] C. Gallicchio, J. D. Martin-Guerrero, A. Micheli, and E. Soria-Olivas (26-28 April 2017) Randomized machine learning approaches: Recent developments and challenges. In ESANN 2017 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 77–86. Cited by: §1.
  • [5] P. Geurts, D. Ernst, and L. Wehenkel (2006) Extremely randomized trees. Machine Learning 63 (1), pp. 3–42. External Links: ISSN 1573-0565, Document Cited by: §1, §2.
  • [6] G. Huang, H. Zhou, X. Ding, and R. Zhang (2012-04) Extreme learning machine for regression and multiclass classification.. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 42 (2), pp. 513–529. External Links: ISSN 1941-0492, Document Cited by: §1.
  • [7] G. Huang, Q. Zhu, and C. Siew (2006-12) Extreme learning machine: Theory and applications. Neural Networks Selected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN ’04)7th Brazilian Symposium on Neural Networks 70 (1–3), pp. 489–501. External Links: ISSN 0925-2312, Document Cited by: §2.
  • [8] G. Huang (2015) What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cognitive Computation 7 (3), pp. 263–278. External Links: ISSN 1866-9964, Document Cited by: §1.
  • [9] H. Jaeger and H. Haas (2004-04) Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science 304 (5667), pp. 78. External Links: Document Cited by: §1.
  • [10] Y. Lin and Y. Jeon (2006-06) Random Forests and Adaptive Nearest Neighbors. Journal of the American Statistical Association 101 (474), pp. 578–590. External Links: ISSN 0162-1459, Document Cited by: §1.
  • [11] M. Lukoševičius and H. Jaeger (2009-08)

    Reservoir computing approaches to recurrent neural network training

    Computer Science Review 3 (3), pp. 127–149. External Links: ISSN 1574-0137, Document Cited by: §1.
  • [12] Y. Miche, M. van Heeswijk, P. Bas, O. Simula, and A. Lendasse (2011-09) TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization. Advances in Extreme Learning Machine: Theory and Applications Biological Inspired Systems. Computational and Ambient Intelligence Selected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009) 74 (16), pp. 2413–2421. External Links: ISSN 0925-2312, Document Cited by: §3.
  • [13] Tin Kam Ho (1998-08) The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8), pp. 832–844. External Links: ISSN 0162-8828, Document Cited by: §1.