1 Introduction
Randomized methods are a recent trend in practical machine learning
[4]. They enable the high performance of complex nonlinear methods without the high computational cost of their optimization. Current most prominent examples are randomized neural networks, in both feedforward
[8] and recurrent [11] forms. For the latter, the randomized approach provided an efficient training method for the first time, and enabled achieving stateoftheart performance in multiple areas [9].Random forest [13] is one of the best methods for Big Data processing due to its adaptive nearest neighbour behavior [10]. The forest predicts an output based only on local data samples. Such an approach works the better the more training data is available, thus making for a perfect supervised method for Big Data. Knearest neighbors algorithm benefits from more data as the data itself is the model, but Random Forest avoids the quadratic scaling of kNearest neighbors in terms of the data samples, that makes it prohibitively slow for largescale problems.
Decision tree [1]
is a building block of Random Forest. A deep decision tree has high variance but low bias. An ensemble of multiple such trees reduces variance, and improves the prediction performance. Additional measures are taken to make the trees in an ensemble as different as possible, including random subsets of features and boosting
[2].The paper proposes a merge between random methods and a decision tree, called an Extreme Learning Tree (ELT). The method builds a tree using expanded data features from an Extreme Learning Machine [6], by splitting nodes on a random feature at a random point. The result is an Extremely Randomized Tree [5]
. Then a linear observer is added to the leaves of the tree, that learns a linear projection from the leaves to the target outputs. Each tree leaf is represented by its index, in the onehot encoding format.
2 Methodology
Extreme Learning Tree consists of three parts. First, it generates random data features using an Extreme Learning Machine (ELM) [7]. Second, it builds a random tree from these features, similar to Extremely Randomized Trees [5]
. Each data sample is then represented by the index of its leaf from the tree, in onehot encoding. Third, a linear regression is learned from the dataset in that onehot encoding to the target outputs.
ELT follows the random methods paradigm as it has an untrained random part (the tree), and a learned linear observer (a linear regression model from leaves of the tree to the target outputs).
An ELT tree has two hyper parameters: the minimum node size, and the maximum thee depth. A node data is split by a random feature using a random split point. Split points that generates nodes under the minimum size are rejected. Nodes that reach the maximum depth or under twice the minimum size become leafs. Node splitting continues until there are nonleaf terminal nodes.
3 Experimental results
The Extreme Learning Tree is tested on wellknown Iris flower dataset [3], in comparison with a Decision Tree, an L2 regularized ELM [12]
, and Ridge regression. Decision Tree implementation is from the ScikitLearn library
^{1}^{1}1http://scikitlearn.org/stable/auto_examples/tree/plot_iris.html.The random tree in the ELT method splits data samples into groups of similar ones. The resulting structure in the original data space is shown on Figure 1. The tree works as a adaptive nearest neighbour, combining together similar samples. Then the target variable information from these samples is used by a linear observer to make predictions.
A formal performance comparison is done on Iris dataset. The data is randomly split into 70% training and 30% test sets, and the test accuracy is calculated for all the methods. The whole experiment is repeated 100 times. Mean accuracy and its standard deviation are presented in Table
1.Method  Accuracy std, % 

Ridge regression  
Extreme Learning Tree  
ELM  
Decision Tree 
In this experiment, an Extreme Learning Tree performs under ELM and Decision Tree methods. However, it outperforms a linear model (in the form of Ridge regression) by a significant margin. Outperforming a linear model is an achievement for a single ELT, as it represents each data sample by a single number – an index of its leaf in the tree.
Decision surface of ELT is visualized on Figure 3. The boundaries between classes have complex shape, but the classes are unbroken. Class boundaries of the original Decision Tree (shown on Figure 3) break into each other creating false predictions. They are always parallel to an axis, while ELT learns class boundaries of an arbitrary shape.
4 Conclusions
The paper proposes a new version of decision tree, that follows the random methods paradigm. It consists of an untrained random nonlinear tree, and a learned linear observer. The method provides decision boundaries of a complex shape and with less noise than an original decision tree. It outperforms a purely linear model in accuracy despite representing the data samples only by a corresponding tree leaf index.
Future works will examine an application of Extreme Learning Tree to an ensemble method similar to Random Forest.
References
 [1] (1984) Classification and regression trees. CRC press. External Links: ISBN 0412048418 Cited by: §1.
 [2] (2001) Random Forests. Machine Learning 45 (1), pp. 5–32. External Links: ISSN 15730565, Document Cited by: §1.
 [3] (1936) THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS. Annals of Eugenics 7 (2), pp. 179–188. External Links: ISSN 20501439, Document Cited by: §3.
 [4] (2628 April 2017) Randomized machine learning approaches: Recent developments and challenges. In ESANN 2017 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 77–86. Cited by: §1.
 [5] (2006) Extremely randomized trees. Machine Learning 63 (1), pp. 3–42. External Links: ISSN 15730565, Document Cited by: §1, §2.
 [6] (201204) Extreme learning machine for regression and multiclass classification.. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 42 (2), pp. 513–529. External Links: ISSN 19410492, Document Cited by: §1.
 [7] (200612) Extreme learning machine: Theory and applications. Neural Networks Selected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN ’04)7th Brazilian Symposium on Neural Networks 70 (1–3), pp. 489–501. External Links: ISSN 09252312, Document Cited by: §2.
 [8] (2015) What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle. Cognitive Computation 7 (3), pp. 263–278. External Links: ISSN 18669964, Document Cited by: §1.
 [9] (200404) Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science 304 (5667), pp. 78. External Links: Document Cited by: §1.
 [10] (200606) Random Forests and Adaptive Nearest Neighbors. Journal of the American Statistical Association 101 (474), pp. 578–590. External Links: ISSN 01621459, Document Cited by: §1.

[11]
(200908)
Reservoir computing approaches to recurrent neural network training
. Computer Science Review 3 (3), pp. 127–149. External Links: ISSN 15740137, Document Cited by: §1.  [12] (201109) TROPELM: A doubleregularized ELM using LARS and Tikhonov regularization. Advances in Extreme Learning Machine: Theory and Applications Biological Inspired Systems. Computational and Ambient Intelligence Selected papers of the 10th International WorkConference on Artificial Neural Networks (IWANN2009) 74 (16), pp. 2413–2421. External Links: ISSN 09252312, Document Cited by: §3.
 [13] (199808) The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8), pp. 832–844. External Links: ISSN 01628828, Document Cited by: §1.