1 Introduction
Many data science issues have to be addressed through unbalanced datasets. Indeed, it may be quite affordable to gather data on the representatives of a given pathology in medicine, or positive operating scenarios of machines in the industry
[Khan2009]. The related complementary occurrences are, by contrast, scarce and/or expensive to raise. The practice of OneClass Classification (OCC) has been developed within this consideration [Moya1993, Khan2009].Oneclass classifiers are trained on a single class sample, in the possible presence of a few counterexamples. The related issue consists of understanding and isolating a given class from the rest of the universe. The resulting model allows to predict
target (or positive) patterns and to reject outlier (or negative) ones.OneClass Support Vector Machine (OCSVM) is a popular OCC method
[Scholkopf2001, Chang2011]. Statisticsbased techniques such as Gaussian models and Kernel Density Estimation (KDE) [Silverman1986] are also commonly considered as respectively parametric and nonparametric approaches to estimate a sample distribution. Thresholded at a given level of confidence, this estimation is used to reject any instance located beyond the decision boundary thus established [Tarassenko1995]. Actually, these OCC methodologies present the disadvantage of losing performance and readability towards high dimensional samples [Desir2013]. Yet in a number of applications like clinical decision support, it is crucial to goal readable (and thus interpretable) predictions, beyond accuracy.Basically devoted to supervised classification, decision trees [Quinlan1986] provide satisfaction on both objectives of performance and interpretability. Under a sequential reasoning scheme, the related inference mechanism is indeed very close to the human way of thinking [Duch2004]
. However, this quality is missed in treebased oneclass methodologies, generally focused on ensemble strategies like random forests to boost the performances
[Desir2013, Goix2016]. In this case, a oneclass problem is converted into a binary one in generating artificial outliers based on which a supervised decision tree is finally trained.Our work puts forward a hybrid oneclass classifier, called OneClass decision Tree (OCTree). The construction of OCTrees relies on an innovative splitting mechanism which is supported by Kernel Density Estimation (KDE). A parent node is divided in one or several interval(s) of interest, based on indications provided by density estimation. The contributions of our work are exposed below.

As a single and readable treebased classifier, the OCTree has proved successful in comparison to an ensemble technique like the performant OneClass Random Forest (OCRF) which presents a poor potential of interpretability [Desir2013].

The OCTree has proved robustness against high dimensional data in comparison to reference methods of the literature, including the multidimensional KDE.

As a result of (1) and (2), the OCTree achieves the integration of a multidimensional KDE within an intuitive and structured decision scheme, based on a subset of significant attributes, with increased performances in comparison to the original method.

In making KDE the key element of learning, the OCTree is also suitable for clustering. The method is indeed able to raise clusters through hyperrectangles in preserving their structure.
2 Our proposal
Our OneClass Tree (OCTree) is implemented in a divide and conquer spirit in order to find target groupings, i.e. parts of the space where target samples are concentrated, and to describe these groupings in a simple and readable way. Let us note:

the initial set of training instances;

a space of dimensions including ;

the set of continuous training attributes;

the set of training instances available at a given node ;

the subspace of dimensions related to node ().
At each node , the algorithm searches the attribute which raises at best from , a number of (non necessarily adjacent) target subspace(s) such that:
(1) 
is the value of instance for attribute ; and are respectively the left and right bounds of the closed subintervals raised to split the current node in target nodes , based on attribute . As the divisions are made parallel to the axes, the target subspaces may be seen as hyperrectangles of interest. To achieve this result at a given node , the training algorithm processes each training attribute according the following steps.

Compute a Kernel Density Estimation (KDE), i.e. an estimation of the probability density function
based on the available training instances (see section 2.1). 
Divide based on the modes of (see section 2.2).

Assess the quality of the division by computing the resulting impurity decrease (see section 2.3).
The attribute that achieves the best impurity decrease is selected to split the current node in child nodes. If necessary, some branches are prepruned, in order to preserve the interpretability of the tree (see section 2.4). The algorithm is run recursively; termination occurs when some stopping condition is reached (see section 2.5).
2.1 Density estimation
In order to identify concentrations of target instances, we have to estimate their distribution over the space, which is provided by a Kernel Density Estimation (KDE). In particular, our proposal is based on the popular Gaussian kernel [Silverman1986]:
is the set of instances available at node , the kernel function and , a parameter called bandwidth.
The parameter influences the pace of the resulting function [Silverman1986]. As tends towards zero, appears overshaped while high values of induce a less detailed density estimation. Adaptive methods, such as a leastsquares crossvalidation, may help setting the bandwidth value [Jones1996, Li2007]. However, such iterative techniques are computationally expensive; their use may be hardly considered in this context of recursive divisions. Hence, we compute based on the following formula [Silverman1986]:
(2) 
where
is the standard deviation of the sample
and, the associated interquartile range. The first relation corresponds to the Silverman’s
rule of thumb [Silverman1986]. We consider the second relation to address samples having a zero . Indeed, a zero may reveal very concentrated data, with the potential presence of some singularities that should be eliminated.2.2 Division
At node , division is executed based on , in four steps.

Clipping KDE ()
is thresholded at the level .
This allows to raise a set of target subintervals . 
Revision ()
If is modal () and , revision occurs since some modes were not identified. Each subinterval of is thus analyzed: if its image by includes at least a significant local minimum, intermediate apertures are created around this (these) local minimum (minima). 
Assessment ()
The subintervals of covering a number of training instances inferior to a quantity are dropped. This ensures keeping the most significant target nodes. 
Shrinking
The detected subintervals are shrunk in close intervals, in a way to fit the domain strictly covered by the related target training instances, as defined by Eq. (1).
Actually, is potentially updated at the end of steps (b), (c), (d).
If we consider the KDE presented by Fig.1, (a) results in . As the density estimation is 3modal in this case, a revision of the interval partitioning (b) is launched. It appears there is no need to split the subinterval since the piecewise includes a single maximum. By contrast, a local minimum is detected in , in the piecewise . The subinterval [A,B] is thus split into three parts around
Concretely, such a split occurs if the local minimum is significant, i.e. sufficiently deep in comparison with both nearby local maxima. In mathematical terms:
Thus . Steps (c) and (d) are then launched. The subintervals are shrunk around the target training instances (represented by crosses in Fig.1), which results in:
The complement represents the set of outlier subspaces: it may be represented by a single branch entitled ”else”.
Except for prior knowledge that would help choosing its value more specifically, there should be no reason to set a high reject threshold (e.g. ) since the training set is supposed to include target instances only; this would be penalizing with the exclusion of real target nodes as a consequence. The influence of parameters and will be discussed further (see section 4). Basically, the value of the clipping threshold should be low (e.g. 0.05), because it aims at rejecting outliers. Parameter is related to the width of the intermediate apertures created between a couple of close target groupings: if this aperture includes very few instances, it will be dropped. Thus, we may set per default and set a nonzero value in case of noisy datasets. Finally, a nonzero value for (e.g. 0.5) will lead to revision, which appears to be interesting if we want to detect precisely target groupings.
2.3 Impurity decrease computation
The adaptation of the classical supervised decision tree to OCC is generally achieved through the physical or virtual generation of outliers in each node to enable the emergence of target concentrations [Hempstalk2008, Desir2013, Goix2016]. As a result of the division, each child node includes instances which has to be estimated. The work of [Goix2016] assumes
where denotes the measure of the hyperrectangle to which node relates. Based on this predictive calculation, [Goix2016] gives a proxy for the Gini impurity decrease for the purpose of OCC. We adapt this result to our proposal where more than two child nodes may result from division:
where is the total number of target and outlier subintervals, included in . Thus, impurity decrease is computed with no need for the physical generation of outliers.
2.4 Prepruning mechanism
A branch of an OCTree is prepruned if there are no more eligible attributes for division. An attribute is not eligible if:

for this attribute, all the instances have the same value;

the computed bandwidth is strictly inferior to the minimum of the difference between two (different) successive values in the set of available instances, i.e. data granularity.
At a given node , a division based on a noneligible attribute makes no more sense. Besides, the eligibility of the whole set of training attributes is lost if the algorithm selects successively a same attribute or a same sequence of attributes to cut a same target node, i.e. in tightening more the domain covered by the available training instances in the associated hyperrectangle. These successive refinement splittings result in accuracy loss in the bounds of the hyperrectangle domain in question. Obviously, such divisions are useless; they contribute to the target space erosion. Fig. 2 shows a tree learned on two training attributes. The nodes in dotted lines are developed in absence of a prepruning mechanism; the latter allows to get a shorter and readable decision tree. Note the branches related to outliers are omitted for the sake of clarity.
It should be noted the user has the choice to keep either the tree as a full predictive model which describes the development that brought to the space division, or the description of the final target hyperrectangles as a set of subintervals of interest regarding the attributes that were used for division.
2.5 Stopping conditions
The algorithm converges under some global and local conditions.

Global condition (, maxit)
At each iteration, we compute the training accuracy : it corresponds to the ratio of training instances included in the target nodes. The algorithm is stopped if, rounded to (e.g. ), remains stable after maxit iterations in which no additional target node was raised. Indeed, in this case, the training process reaches a stage where the target subspaces are simply more precisely delimited on the basis of additional attributes, with no further multiplication. As a result, the more maxit is high, the more complex the predictive model is. The parameter tunes therefore the length of the model. We can set maxit by default, in a way that the resulting model focuses only on the attributes with the most significant separative power. 
Local conditions
Divisions may be stopped locally if there are compelling reasons to convert a node in a leaf, i.e. when prepruning is necessary (see section 2.4).
3 Experimental procedure
3.1 Single evaluation
First, we propose a targeted qualitative evaluation of our method. We thus use synthetic data in order to assess the advocated methodology in ideal conditions with respect to the expected objective of delineating target hyperrectangles. These datasets are composed of twodimensional Gaussian blobs (see Figure 3): by means of different blob dispositions, sizes and span, we can study the influence of the algorithm parameters related with the tree construction. These Gaussian blobs play the role of different groupings as representatives of the same target class.
3.2 Comparison with reference methods
In absence of a universal experimental protocol and benchmark data for OCC, it is standard practice to convert multiclass problems into OneClass (OC) ones for evaluation purposes. In that regard, the one vs rest [Desir2013] approach consists in considering a class as a target one and the others as outliers [Wang2006, Hempstalk2008, Desir2013, Nguyen2015, Fragoso2016, Wang20162]. Following the appropriate conversion of reference datasets, the evaluation of an OC classifier is generally conducted under a CrossValidation (CV) strategy, with a range of possible variants depending on the options envisaged, i.e. with/without stratification, number of folds (), repetition(s). In the context of a oneclass problem, a CV strategy is lead in a way that once the folds are created, the folds on which the classifier is trained are devoid of outliers [Ratle2007, Hempstalk2008, Desir2013, Nguyen2015, Fragoso2016, Wang20162].
Let us denote as (resp. ), the number of True Targets (resp. True Outliers), i.e. the number of instances correctly detected as targets (resp. outliers); (resp. ) are the number of False Targets (resp. False Outliers) [Nguyen2015]. In the context of OCC, it is convenient to resort to the Matthews Correlation Coefficient (MCC). In particular, the work of [Desir2013] shows MCC is well suited for the assessment of OCC classifiers [Maldonado2014, Fragoso2016, Wang20162]. Derived from the Pearson correlation coefficient for binary configurations [Baldi2000], the MCC, given by Eq. (3), measures the correlation between the predictions and the real instance labels.
(3) 
A zero MCC indicates the classifier makes arbitrary decisions or fails in predicting both outputs simultaneously [Zhang2012, Desir2013]. One can therefore understand that the information of accuracy, given by Eq. (4), complemented by the MCC form an interesting way of measuring the real performance of an OC classifier.
(4) 
Under a one vs rest approach, we adopt a repeated stratified crossvalidation strategy. Indeed, stratification and repetition in crossvalidation procedures help to reduce the variability of the performance measures, averaged over the iterations [Witten2005]. The tests are based on data including continuous attributes, extracted from the UCI repository [UCIRepository]. We compare our results with the recent ones of [Desir2013] proposing a learning methodology of OneClass Random Forests (OCRF). Moreover, the work of [Desir2013]
proposes a comparison to reference OCC methods, which allows us to extend our assessment scope based on the performances of the OCSVM, KDE, Gaussian Mixture Model (GMM) and Gaussian estimator. To ensure a fair comparison, we assessed our OCTree in the same conditions as in
[Desir2013], i.e. a stratified 10fold CV strategy, repeated five times.4 Results & Discussion
Preliminary remark: unless otherwise specified, the results related to our OCTree are achieved with the following parameter values (cf. section 2).

(with min.10 instances/node)

, maxit
4.1 Comparison of oneclass and multiclass tasks
We start by comparing the results of the training process exerted on the dataset (see Fig. 4):

with algorithm C4.5, through the resolution of a multiclass problem supposing that each Gaussian blob is associated to a distinct class. The associated space division is represented in dashed lines in Fig. 4.

with OCTree. In this case, the Gaussian blobs are all the representatives of the same single class. The limits of the corresponding hyperrectangles are represented in continuous lines in Fig. 4.
The same observation is made as regards , represented in Fig. 5. The C4.5 decision tree (see Fig. 6) rests on a single division based on attribute , while the OCTree (see Fig. 7) uses both dimensions, in order to isolate the subspaces covered by both blobs.
As expected, multi and oneclass learning processes lead to different predictive models. Indeed, in the context of a multiclass problem, the class representatives are supposed to share the whole domain in which the attributes take their values. Hence, a decision tree learned with an algorithm like C4.5 proposes a decomposition of the whole space in hyperrectangles. On the opposite, aiming at solving a oneclass classification problem, we propose a learning process looking for target hyperrectangles that do not necessarily cover the whole domain in which the attributes take their values, since there may exist outliers to discard.
4.2 Parameters influence
The second part of our experimental procedure is dedicated to the study of the influence of the parameters. In particular, let us note the parameter conditions the level at which the estimation of the probability density function is clipped in order to get target zones of interest (see Fig. 1). To understand the influence of , we conducted the learning process on , and in making evolving from to in steps of . We notice that as increases, the training accuracy decreases (see Fig. 8). This is expected since a high value of leads to reject a high proportion of training instances as outliers.
The couple of parameters and allows to identify more precisely concentrations of instances which are close on some dimension . Indeed, such a proximity may impact the corresponding estimation of the density by the existence of shallow local minima located above the clipping level . In absence of any revision of splitting (see section 2.2), such concentrations would be roughly included in a same hyperrectangle.
We experiment the marginal effect of in setting . Fig. 9 shows the related influence on . As expected, a zero value of leads to the identification of a single hyperrectangle including both Gaussian blobs. Setting gives a more pertinent result, in identifying each blob separately. For a given value of , a nonzero value of allows to isolate some buffer zones between close concentrations of instances. As long as they are significant, i.e. include a number of training instances greater than a proportion of the training set, such intermediate zones may be raised for the purpose of a more nuanced interpretation, e.g. to localize regions of incertitude or transition between two different subconcepts of the target class. Should these intermediate zones not be significant, they allow anyway, with their elimination, to reinforce noise rejection in the neighborhood of close subconcepts of the class. This fact is illustrated in Fig. 10 which shows the result of the training process exerted on in setting . Actually, it turns out that, during training, an aperture was created between two close regions of concentration. As includes instances, i.e. of the training set size, it was dropped with no further processing, which allows actually to emphasize the individuality of each zone and . In particular, the borders of both hyperrectangles are better defined in comparison to the result exposed in Fig. 4 achieved with . Let us note the negative impact of high values of , not on the training accuracy, but on the quality of the localization of target groups. Obviously, increasing the value of encourages the creation of large and sparse apertures, which extend over nearby consistent concentrations. Fig. 11 illustrates this effect on with : a third hyperrectangle appears since it includes enough instances.
The marginal effects of both parameters and are combined in a global effect. The interaction of these parameters for the treatment of the dataset is exposed by Fig. 12. The tests are achieved with nonzero values of and a zero value of . Depending on the value of , it appears the results are qualitatively different. In particular, a value of close to one (in this case, ) involves a systematic revision of the subdivisions, which may cause the emergence of additional small hyperrectangles and a less precise localization of the instance concentrations (see Fig. 12  ). On the opposite, a low value of (in this case, ) may lead to none revision of the subdivisions, because the related constraint to satisfy for splitting is severe. A good compromise is achieved with (see Fig. 12  ) through a quite perfect detection of the blobs. That being said, a nonzero value of (see Fig. 12  ) allows to create intermediate apertures which intercept the problem of unstructured divisions observed with high values of , as illustrated by Fig. 12  . The result with (see Fig. 12  ) gives indeed a more structured division in comparison to the one which occurs in setting the same value for and .
In view of the foregoing, parameter conditions the granularity of the subdivisions. A value of equal to 0.5 appears reasonable in this regard. As far as parameter is concerned, it allows to deal with noisy training sets, in reinforcing the rejection of small groupings of outliers. It may also be used to enhance intermediate regions between close significant groupings for the sake of a deeper interpretation of the data distribution in the space. One can thus set by default and adapt the value if necessary.
Accuracy (%)  MCC  

Dataset  Class  OCTree  OCRF [Desir2013]  OCTree  OCRF [Desir2013] 
Diabetes  Positive  50.3  46.4  0.131  0.139 
Negative  69.1  68.7  0.323  0.241  
Ionosphere  Bad  77.6  56.7  0.511  0.169 
Good  62.8  83.3  0.439  0.683  
Glass  Build wind float  69.7  66.2  0.324  0.403 
Build wind nonfloat  61.0  56.5  0.179  0.229  
Vehic wind float  66.4  69.0  0.135  0.064  
Containers  81.8  90.0  0.433  0.498  
Headlamps  90.2  95.0  0.401  0.813  
Iris  Versicolor  87.2  81.5  0.748  0.579 
Virginica  92.0  82.7  0.835  0.614  
Setosa  87.5  87.1  0.772  0.722  
Sonar  Mines  45.5  53.3  0.087  0.048 
Rocks  59.0  59.0  0.193  0.179  
Pendigits  0  97.2  99.6  0.842  0.976 
1  83.3  85.8  0.550  0.585  
2  96.2  96.3  0.791  0.835  
3  97.2  98.5  0.832  0.918  
4  96.9  99.3  0.822  0.961  
5  94.7  94.1  0.764  0.756  
6  97.4  99.7  0.845  0.985  
7  93.7  97.6  0.689  0.887  
8  94.3  89.3  0.697  0.634  
9  86.1  85.9  0.454  0.577  
Mfeat factors  0  63.5  97.2  0.326  0.844 
1  60.5  97.8  0.247  0.873  
2  90.5  97.9  0.590  0.879  
3  76.7  98.0  0.332  0.887  
4  95.3  98.0  0.739  0.884  
5  51.4  97.3  0.223  0.843  
6  68.3  98.5  0.353  0.910  
7  88.4  97.9  0.614  0.879  
8  59.8  90.6  0.282  0.613  
9  81.6  97.6  0.486  0.866  
Mfeat morphology  0  94.3  91.6  0.738  0.698 
1  76.6  56.5  0.429  0.304  
2  70.7  54.0  0.381  0.291  
3  62.4  63.5  0.334  0.335  
4  79.6  56.8  0.471  0.294  
5  74.3  67.4  0.427  0.378  
6  77.6  88.7  0.438  0.637  
7  84.4  70.0  0.541  0.398  
8  90.8  98.9  0.647  0.943  
9  75.9  76.7  0.424  0.456 
4.3 Performance comparison
Interestingly, our OCTree compares favorably with another treebased method like the OneClass Random Forest (OCRF) [Desir2013]. The performances of both classifiers on different benchmark data are exposed by Table 1, in terms of averaged accuracy and MCC. Let us note the highdimensional datasets pendigits and multiple features (mfeat) are related to the recognition of numerals, from 0 to 9. We selected from mfeat the subsets related to profile correlations (mfeatfac), and morphological features (mfeatmorph). Indeed, in terms of MCC, OCRF dominates the other reference methods as regards the mfeatfac set, while it performs less well on mfeatmorph.
We can notice the OCTree stands out advantageously as regards some datasets, which can be generally observed by a joint increase in accuracy and MCC (see in bold). There are also some improved accuracy rates for positive correlations (see in italic). In particular, the OCTree tackles well the problem of numerals recognition, as regards the pendigits and mfeatmorph datasets. By contrast, on the mfeatfac dataset, our proposal appears to be less pertinent. Actually, the related training instances overlap in the space in a rather confusing distribution that ensemble techniques like OCRF may naturally better address.
In Table 2, we extend the comparison of our OCTree, in terms of MCC, with other non greedy methods which consider the treatment of the training attributes all at once. These methods are the OneClass Support Vector Machine (OCSVM), the Gaussian estimator (Gauss), the Kernel Density Estimator (KDE) and the Gaussian Mixture Model (GMM); the related results are extracted from [Desir2013]. The latter classification techniques, except for Gauss, fail in tackling a certain number of problems, which is noticeable through the numerous MCC values close or equal to 0. In that regard, greedy methods like the OCTree and OCRF present a clear advantage.
Let us remind the OCTree is based on successive onedimensional kernel density estimations, followed by the detection of target intervals of interest, in the frame of recursive process. Somehow, the OCTree may be perceived as an adaptation of the KDE, making way for a certain transparency in data processing, and in selecting only the most meaningful attributes in the sense of the purity criterion. We thus focus on the comparison of the OCTree to the (multidimensional) KDE. The MCC values achieved with KDE and improved by the OCTree are marked with an asterisk in Table 2. It appears the OCTree stands out in 80% of the cases, and is able to deal with high dimensional data like pendigits and mfeatfactors, where the multidimensional KDE fails completely.
Dataset  Class  OCTree  OCRF [Desir2013]  OCSVM [Desir2013]  Gauss [Desir2013]  KDE [Desir2013]  GMM [Desir2013] 

Diabetes  Positive  0.131  0.139  0  0.147  0.188  0.219 
Negative  0.323  0.241  0  0.046  0.064*  0.020  
Ionosphere  Bad  0.511  0.169  0.348  0.410  0.106*  0.346 
Good  0.439  0.683  0.785  0.781  0.180*  0.584  
Glass  Build wind float  0.324  0.403  0.896  0.465  0.484  0.509 
Build wind nonfloat  0.179  0.229  0.880  0.212  0.322  0.365  
Vehic wind float  0.135  0.064  0.908  0.179  0.145  0.091  
Containers  0.433  0.498  0.465  0.964  0.307*  0.823  
Headlamps  0.401  0.813  0.703  0.308  0.877  0.749  
Iris  Versicolor  0.748  0.579  0.897  0.903  0.685*  0.607 
Virginica  0.835  0.614  0.900  0.813  0.716*  0.604  
Setosa  0.772  0.722  0.903  0.921  0.799  0.643  
Sonar  Mines  0.087  0.048  0.882  0.342  0*  0.222 
Rocks  0.193  0.179  0.889  0.120  0*  0.274  
Pendigits  0  0.842  0.976  0  0.970  0.100*  0.961 
1  0.550  0.585  0  0.652  0.212*  0.835  
2  0.791  0.835  0  0.957  0*  0.956  
3  0.832  0.918  0  0.969  0.092*  0.949  
4  0.822  0.961  0  0.969  0*  0.953  
5  0.764  0.756  0  0.880  0.092*  0.942  
6  0.845  0.985  0  0.970  0*  0.954  
7  0.689  0.887  0  0.887  0*  0.937  
8  0.697  0.634  0  0.716  0*  0.951  
9  0.454  0.577  0  0.577  0.093*  0.936  
Mfeat factors  0  0.326  0.844  0  0.737  0*  0 
1  0.247  0.873  0  0.712  0*  0  
2  0.590  0.879  0  0.740  0*  0  
3  0.332  0.887  0.017  0.695  0*  0  
4  0.739  0.884  0  0.743  0*  0  
5  0.223  0.843  0.013  0.738  0*  0  
6  0.353  0.910  0.068  0.770  0*  0  
7  0.614  0.879  0.017  0.841  0*  0  
8  0.282  0.613  0  0.647  0*  0  
9  0.486  0.866  0.026  0.751  0*  0  
Mfeat morphology  0  0.738  0.698  0.136  0.682  0.765  0.764 
1  0.429  0.304  0  0.345  0.375*  0.395  
2  0.381  0.291  0  0.400  0.457  0.407  
3  0.334  0.335  0.030  0.326  0.298*  0.328  
4  0.471  0.294  0  0.432  0.443*  0.430  
5  0.427  0.378  0.013  0.468  0.388*  0.468  
6  0.438  0.637  0.057  0.397  0.398*  0.416  
7  0.541  0.398  0.026  0.524  0.505*  0.540  
8  0.647  0.943  0.013  0.682  0.666  0.645  
9  0.424  0.456  0.013  0.389  0.395*  0.398 
5 Conclusion & Future work
The reality of poor data availability, notably in medical and industrial applications, has lead to look for alternatives to the traditional supervised techniques. The practice of oneclass classification has been proposed within this consideration. This recent discipline of machine learning has generated a considerable interest with the development of new classification techniques, some of which were adapted from supervised classification techniques.
In this work, we proposed a oneclass decision tree by completely rethinking the splitting mechanism considered to build such models. Our OneClass Tree (OCTree) may be actually seen as an adaptation of the KDE for the sake of readability and interpretability, based on a subset of significant attributes for the purpose of prediction. In that respect, our method has proved successful in comparison to multidimensional KDE, as also to OneClass Random Forest (OCRF).
This work leaves some interesting perspectives. In particular, our proposal is able to deal with continuous attributes; it would be thus judicious to consider the treatment of nominal and ordinal variables. Theoretically, our work supports such variables, by means of an adaptation of the density estimation technique: the use of discrete histograms is an avenue worth exploring in that regard. Furthermore, the parametrization of the KDE remains an open question as regards the computation of the bandwidth
and the use of other kernels . Indeed, on the one hand, our proposal is based on a Gaussian kernel attractive by its mathematical properties, but the pertinence of other configurations may be studied on a comparative basis. On the other hand, deduced based on the Silverman’s rule of thumb, is quite sensitive to the training set content. In our proposal, this sensitivity is controlled by setting a prepruning mechanism. In the future, we would like to rise to the challenge of establishing a rule able to address this issue of sensitivity.6 Acknowledgments
This work is funded by the Fonds de la Recherche Scientifique  FNRS (F.R.S. FNRS), Brussels (Belgium).
Comments
There are no comments yet.