GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings

06/05/2018
by   Marek Sikora, et al.
0

This article presents GuideR, a user-guided rule induction algorithm, which overcomes the largest limitation of the existing methods-the lack of the possibility to introduce user's preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user's requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/02/2019

RuleKit: A Comprehensive Suite for Rule-Based Learning

Rule-based models are often used for data analysis as they combine inter...
06/09/2021

SCARI: Separate and Conquer Algorithm for Action Rules and Recommendations Induction

This article describes an action rule induction algorithm based on a seq...
04/01/2022

Separate and conquer heuristic allows robust mining of contrast sets from various types of data

Identifying differences between groups is one of the most important know...
06/09/2010

Measuring interesting rules in Characteristic rule

Finding interesting rule in the sixth strategy step about threshold cont...
10/26/2021

Open Rule Induction

Rules have a number of desirable properties. It is easy to understand, i...
01/07/2022

An Accelerator for Rule Induction in Fuzzy Rough Theory

Rule-based classifier, that extract a subset of induced rules to efficie...
09/21/2019

Automatic Weighted Matching Rectifying Rule Discovery for Data Repairing

Data repairing is a key problem in data cleaning which aims to uncover a...

Code Repositories

GuideR

User-guided separate-and-conquer rule learning in classification, regression, and survival settings


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sequential covering rule induction algorithms can be used for both, predictive and descriptive purposes blaszczynski2011 ; furnkranz1999 ; grzymala2003 ; kaufman1991 . In spite of the development of increasingly sophisticated versions of those algorithms liu2018induction ; valmarska2017 , the main principle remains unchanged and involves two phases: rule growing and rule pruning. In the former, the elementary conditions are determined and added the rule premise. In the latter, some of these conditions are removed.

In comparison to other machine learning methods, rule sets obtained by sequential covering algorithm, also known as separate-and-conquer strategy (SnC), are characterized by good predictive as well as descriptive capabilities. Taking into consideration only the former, superior results can often be obtained using other methods, e.g. neural-fuzzy networks, support vector machines, or ensemble of classifiers

boser1992 ; czogala2000 ; rokach2010 ; siminski , especially ensemble of rules dembczynski2010 . However, data models obtained this way are much less comprehensible than rule sets.

In the case of rule learning for descriptive purposes, the algorithms of association rule induction agrawal1994 ; kavsek2006 ; stefanowski2001 or subgroup discovery lavravc2004 ; valmarska2017 , are applied. The former leads to a very large number of rules which must then be limited by filtering according to rule interestingness measures geng2006 ; greco2016 ; bayardo . Nevertheless, rule sets obtained by subgroup discovery are characterized by worse predictive abilities than those generated by the standard sequential covering approach.

Therefore, if creating a prediction system with comprehensible data model is the main objective, the application of sequential covering rule induction algorithms provides the most sensible solution.

In the works wrobel2017 ; wrobel2016 ; sikora2012 ; sikora2013data , we have presented and confirmed on dozens of benchmark datasets the effectiveness of our version of the sequential algorithm for generating classification, regression, and survival rules. This article presents the semi-interactive version of that algorithm, which overcomes the largest limitation of the existing rule induction methods—the lack of the possibility to introduce user‘s knowledge (or expert‘s knowledge) to the learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which induced rules do not contain the most important information from the user‘s point of view. We propose a rule induction algorithm which takes into account user’s requirements. The possibility to specify the initial set of rules, preferred and forbidden conditions/attributes, etc., together with the multiplicity of options and modes, makes our algorithm the most flexible solution for user-guided rule induction. It allows testing various hypotheses concerning data dependencies which are expected or of interest. In particular, the algorithm enables making such hypotheses more specific or more general.

The effectiveness of the guided (semi-automatic) rule induction has been investigated on three test cases concerning various data analysis tasks. Classification was illustrated by the problem of predicting seismic hazards in coal mines (seismic-bumps dataset Sikora2010 ); regression—the problem of methane forecasting (methane dataset githubMethane ); survival analysis—the problem of analysing factors which impact patients’ survival following bone marrow transplants (BMT-Ch dataset kalwak2010 ; sikora2013 ).

The paper is organized as follows: Section 2 concerns overview of works in the area of user-guided rule induction. Section 3 presents the algorithm for induction of classification, regression, and survival rules, with a special stress put on the semi-automatic capabilities. Section 4 is devoted to the analysis of the test cases, together with a discussion of obtained results. Section 5 contains a summary and conclusions.

GuideR software as well as the datasets used in this article are available at https://github.com/adaa-polsl/GuideR or http://www.adaa.polsl.pl. All the datasets had been proposed by the authors of this article. The sesimic-bumps dataset is also available in the UCI repository.

2 Related work

The induction of classification rules with sequential covering approach has been known for many years furnkranz1999 ; grzymala2003 ; kaufman1991 ; clark1989 . As it proved its effectiveness in terms of both, the classification accuracy as well as the descriptive abilities of the induced rules (e.g. sikora2011pattern ; moshkov2008 ; tsumoto2004 ), a number of interesting extensions of this approach have been presented napierala2012 ; huhn2009 ; mozina2007 ; riza2014 ; sikora2013redef ; liu2018induction ; valmarska2017 . In contrast, rule induction algorithms have been rarely applied to the regression and survival analysis, although the comprehensibility of resulting data models in these problems is often a key issue.

Regression rules can be straightforwardly derived from regression trees such as CART breiman1984 and M5 quinlan1992 by generating one rule for each path from the root of the tree to its leaf. These algorithms use the divide-and-conquer strategy. The other approach to the regression rule induction is to use a generalization of sequential covering (e.g., PCRvzenko2005 , rule list janssen2011 ). The work of Janssen and Fürnkranz janssen2011 describing the dynamic reduction of the regression problem to the classification is of particular importance in the context of the results presented in this paper. The most advanced methods of regression rule induction are based on ensemble techniques (e.g., RuleFit friedman2008 , RegENDER dembczynski2008 ). To supervise the induction of subsequent rules, these algorithms apply gradient-based optimization methods. The resulting rule sets are characterized by good prediction quality, though they are usually composed of a large number of rules.

Equally few attempts have been made to apply rules to survival analysis. Pattaraintakorn and Cercone pattaraintakorn2008 described the rough set-based intelligent system for analyzing survival data. Another approach employing rough sets was presented by Bazan et al.bazan2002 . The idea was to divide examples into three decision classes on the basis of a prognostic index (PI) calculated with a use of the Cox‘s proportional hazard model. The division of survival dataset into three classes was also made by Sikora et al.sikora2013survival , who applied the rule induction algorithm to the analysis of patients who underwent a bone marrow transplantation. The dataset was divided in the following groups: patients who underwent transplantation at least five years before, patients who died within five years after transplantation, and patients who are still alive but whose survival time is less than five years. Two former classes were used for rule generation, while the latter for model post-processing. Kronek and Reddy kronek2008 proposed the extension of the Logical Analysis of Data (LAD) crama1988

for survival analysis. The LAD algorithm is a combinatorial approach to rule induction. It was originally developed for the analysis of data containing binary attributes; therefore, the discretization and binarization step is usually required. Liu et al.

liu2004

adapted the patient rule induction method to the analysis of survival data. The method uses bump hunting heuristic which creates rules by searching regions in an attribute space with a high average value of the target variable. To deal with censoring, the authors use deviance residuals as the outcome variable. The idea of residual-based approach to censored outcome is derived from survival trees

leblanc1992 .

In comparison to rule-based techniques, tree-based methods received much more attention in survival analysis. The key idea behind the application of tree-based techniques to survival data lies in the splitting criterion. The most popular approaches are based on residuals leblanc1992 ; therneau1990 or use log-rank statistics leblanc1993 to maximize the difference between survival distributions of child nodes. We employed the latter idea in our latest separate-and-conquer rule induction algorithm which uses long-rank statistics as a rule search heuristic (rule quality measure) wrobel2017 . We showed, that in spite of some similarities between rule and trees, our approach renders different models than the divide-and-conquer strategy of tree building.

To date, few studies have concerned rule induction algorithms which take into account user‘s preferences. Stefanowski and Vanderpooten stefanowski2001 presented the Explore algorithm. Based on the idea of the Apriori method, it allows the user to specify the requirements for attributes and/or their values, appearing in the rule premises. Other studies on the induction of association rules describe examples of interactive construction of rules rafea2004 ; kliegr and the generation of the unexpected rules padmanabhan1998 . The latter are created on the basis of user-defined templates, indicating the attributes included in the so-called typical rules. Gamberger and Lavrac gamberger2002 showed a similar proposal for the decision rule induction algorithm intended for descriptive purposes.

Adomavicius and Thazulin adomavicius2001 presented expert-driven methods of validating rule-based data models obtained via the association rule induction algorithm. The approach limits number of rules by applying rule grouping and filtering techniques which are based on the interaction with the user instead of the traditional calculation of the rules attractiveness. Balanchard et al. blanchard proposed an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets of rules. Both the aforementioned methods do not interfere with the induction process.

Algorithms using the paradigm of the argument-based learning mozina2007 allow the user to provide explanation for each example why it has been assigned with a particular decision class. Examples of medical applications show that this approach can significantly reduce the set of generated rules. However, the argument-based learning approach does not verify the hypotheses that represent the dependencies which, in the user’s opinion, might occur in the data. Partially, this possibility was introduced by Chen and Liu chen2001 , where the user defines a set of rules expected to be found in the dataset. Then, the rule-based version of the C4.5 algorithm is executed and three types of rules are generated: consistent, inconsistent, and not related to the user‘s rules. The rule is considered to be consistent with the knowledge if in the set of defined rules, there is at least one rule such that and indicate the same decision class and a set of examples covered by is a subset of examples covered by .

The IBM SPPS Modeler analytical package ibmspss

contains a module of interactive decision trees in which the user can determine the attribute and split value to be included in a given tree node. Moreover, the algorithm allows maintaining the induction of a given tree node at a specific level or starting it from a certain level when the above nodes have been defined by the user.

Even though trees can be straightforwardly translated into rules, the induction of the latter directly from data have an important advantage—the rules can be treated independently. The user or domain expert can alter existing rules or add new ones without affecting the rest of the model. The tree, in contrast, must be treated as a whole—a change of a condition in a node involves the need to modify conditions in all its child nodes. Another feature is that the divide-and-conquer tree generation strategy forbids examples to be covered by multiple rules, while the separate-and-conquer approach for rule induction lacks this limitation. This often leads to discovering stronger or completely new dependencies in the data. Finally, generation of rules from the tree by following the path from the root to leafs always leads to the condition redundancy which is often undesirable.

3 Methods

3.1 Basic notion

Let be the dataset of examples (observations, instances), each being characterized by a set of attributes and a label . The meaning of the label depends on the problem. For classification tasks it corresponds to a discrete class identifier, i.e., . In regression, it is a continuous value: . In survival analysis, it represents the binary censoring status: . In particular, the value of 0 indicates censored observations, also referred to as event-free (e.g., patients without disease recurrence), while 1 are non-censored examples, that were subject to an event (e.g., patients with recurrence). In the survival datasets, an additional variable representing the survival time, i.e. the time of the observation for event-free examples or the time before the occurrence of an event, must be specified.

The th example of classification/regression dataset can be represented as a vector ; in survival problems it must be extended by the survival time: . For simplicity, however, all types of datasets will be denoted as —the dependence of survival datasets on does not affect the idea of the presented algorithm.

Let be a set of rules generated by the induction algorithm, referred later as a rule-based data model or, simply, a model. Each rule has the form:

The premise of a rule is a conjunction of conditions , with being an element of the domain and representing a relation ( for nominal attributes;

for numerical ones). The conclusion of the rule can be a nominal value (classification), a numerical value (regression), or a Kaplan-Meier estimator 

kaplan1958 of the survival function (survival analysis). Corresponding rules will be referred to as classification, regression, and survival rules, respectively. An example satisfying the conditions specified in the rule premise is stated to be covered by the rule.

Rule sets induced by our separate-and-conquer heuristic are unordered. Therefore, applying a model on the observation (e.g. a test example) requires evaluating set of rules covering the example and aggregating the results. This differs from ordered rule sets (decision lists), where the first rule covering the investigated observation determines a model response. The method of aggregation depends on the problem. In classification, the output class label is obtained as a result of voting—each rule from votes with its value of the quality measure used during the induction wrobel2016 . In regression, the model response is an average of conclusions of elements sikora2012 . Similar situation is in the survival rules, but averaging concerns not numbers, but survival estimator functions wrobel2017 .

3.2 Separate-and-conquer

The presented algorithm induces rules according to the separate-and-conquer principle furnkranz1999 ; michalski1973discovering . Here we describe the fully automatic procedure—the user-guided variant is presented in the next subsection. An important factor determining performance and comprehensibility of the resulting model is a selection of a rule quality measure bruha1997 ; an2001rule ; yao1999 ; wrobel2016 (rule learning heuristic furnkranz2005 ; janssen2010quest ; minnaert

) that supervises the rule induction process. In the case of classification problems, our software provides user with a number of state-of-art measures calculated on the basis of the rule confusion matrix. Let

be the considered classification rule. The examples whose labels are the same as the conclusion of will be referred to as positive, while the others will be called negative. The confusion matrix consists of the number of positive and negative examples in the entire training set ( and ), and the number of positive and negative examples covered by the rule ( and ). The idea can be straighyforwardly generalized for weighted examples by replacing numbers of examples in the confusion matrix by sums of their weights. The measures built in the algorithm, e.g., C2 bruha1997 , Correlation furnkranz2005 , Lift bayardo , RSS sikora2013data , or s-Bayesian confirmation greco , evaluate rules using various criteria resulting in very different models. For instance, RSS (also known as WRA furnkranz2005 ) considers equally sensitivity () and specificity () of the rule according to the formula . Another common measure is conditional entropy which describes entropy of an outcome variable

given random variable

as:

(1)

In our case indicates class (positive/negative) and denotes whether rule covers the example (covered/uncovered). Therefore,

(2)
(3)
(4)

The opposite probabilities, i.e.,

, , and can be calculated straightforwardly by subtracting from 1 appropriate value.

The aforementioned measures are also used for evaluating regression rules, as regression is transformed by the algorithm to the binary classification problem. The transformation is done similarly as in janssen2011 . Namely, the median

and the standard deviation

of labels of instances covered by the rule is established. Observations from the entire set with labels from interval are assigned with a positive class. This allows determining elements of the confusion matrix and calculating all aforementioned quality measures. Note, however, that in contrast to classification problems, and values may change as rule coverage is modified.

The different situation is in the case of survival analysis, where rule outcomes are survival function estimates rather than numerical values. Thus, it is desirable for a rule to cover examples which survival distributions differ significantly from that of other instances. For this purpose, we use log-rank statistics harrington1982class as a measure of survival rules quality. It is calculated as , where:

(5)
(6)

() is a set of event times of observations covered (uncovered) by the rule, () is the number of covered (uncovered) observations which experienced an event at time , and () is the number of covered (uncovered) instances at hazard, i.e., still observable at time .

1:—training dataset, mincov—minimum number of yet uncovered examples that a new rule has to cover.
2:—rule set.
3: set of uncovered examples
4: start from an empty rule set
5:repeat
6:      start from an empty premise
7:      grow conditions
8:      prune conditions
9:     
10:      remove from examples covered by
11:until 
Algorithm 1 Separate-and-conquer rule induction.
1:—input rule, —training dataset, —set of uncovered examples, mincov—minimum number of previously uncovered examples that a new rule has to cover.
2:—grown rule.
3:function Grow(, , , )
4:     repeat iteratively add conditions
5:          current best condition
6:          best quality and coverage
7:          Cov(, ) examples from satisfying premise
8:         for  GetPossibleConditions(do
9:               rule extended with condition
10:              
11:              if  then verify coverage requirement
12:                   Quality(, ) rule quality measure
13:                  if  or ( and then
14:                       ,                                           
15:         
16:     until 
17:     return
Algorithm 2 Growing a rule.

Separate-and-conquer heuristic adds rules iteratively to the initially empty set as long as the entire dataset becomes covered (Algorithm 1). To ensure the convergence, every rule must cover at least mincov previously uncovered examples. The induction of a single rule consists of two stages: growing and pruning. In the former (presented in Algorithm 2), elementary conditions are added to the initially empty premise. When extending the premise, the algorithm considers all possible conditions built upon all attributes (line 6: GetPossibleConditions function call), and selects those leading to the rule of highest quality (lines 10–12). In the case of nominal attributes, conditions in the form for all values from the attribute domain are considered. For continuous attributes, values that appear in the observations covered by the rule are sorted. Then, the possible split points are determined as arithmetic means of subsequent values and conditions and are evaluated. If several conditions render same results, the one covering more examples is chosen. Pruning can be considered as an opposite to growing. It iteratively removes conditions from the premise, each time making an elimination leading to the largest improvement in the rule quality. The procedure stops when no conditions can be deleted without decreasing the quality of the rule or when the rule contains only one condition. Finally, for comprehensibility, the rule is post-processed by merging conditions based on the same numerical attributes. E.g., conjunction will be presented as .

In the regression and survival problems, the algorithm is performed once on the entire dataset . For classification tasks, rules are induced independently for all classes. Particularly, when class is investigated, the set is binarized with respect to it: examples with labels equal to are positives, while the other are negatives. The detailed information about our algorithm for classification, regression, and survival rule induction using separate-and-conquer strategy can be found in wrobel2016 ; wrobel2017 . The most important limitation of the presented approach is that induction is fully automatic—the user may control how the model looks like only by selecting quality measure and adjusting mincov parameter.

3.3 Guided rule induction

In order to allow user-guided rule induction, the separate-and-conquer heuristic explained in the previous subsection was extended. The preliminary step of the procedure is specifying user‘s requirements. They consists of several elements ordered by the priority (highest first):

  1. —set of initial (user-specified) rules which have to appear in the model. Depending on the parameters, initial rules are immutable or can be extended by other conditions (existing conditions cannot be altered, though).

  2. /—multisets of preferred conditions/attributes. When deriving a rule, they are used before automatically induced conditions. The user may specify the multiplicity of each preferred element allowing it to be used in a given numbers of rules.

  3. /—sets of forbidden conditions/attributes which cannot not appear in the automatically generated rules.

In the classification problems, the requirements can be defined for each class separately. Additional parameters controlling guided rule induction are:

  • /—boolean indicating whether initial rules should be extended with a use of preferred/automatic conditions and attributes.

  • /—boolean indicating whether new rules should be induced with a use of preferred/automatic conditions and attributes.

  • /—maximum number of preferred conditions/attributes per rule.

  • considerOtherClasses—boolean indicating whether automatic induction should be performed for classes for which no user’s requirements have been defined (classification mode only).

1:—training dataset, —set of initial rules,
2:/—multiset of preferred conditions/attributes,
3:/—set of forbidden conditions/attributes,
4:/—maximum number of preferred conditions/attributes per rule,
5:/—extend initial rules with preferred/automatic conditions (bool),
6:/—induce new rules with preferred/automatic conditions (bool),
7:mincov—minimum number of yet uncovered examples that a new rule has to cover.
8:—Rule set.
9: start from empty rule set
10: set of uncovered examples
11:for  do iterate over initial rules
12:     if   then extend with preferred conditions/attributes
13:                
14:     if  then extend with automatic conditions
15:           exclude forbidden knowledge
16:           prune the rule      
17:      add rule to rule set
18:     
19:if  or  then induce non-user rules
20:     while  do
21:           start from empty rule
22:          if  then
23:                          
24:          if  then
25:               
26:                prune the rule           
27:           add rule to rule set
28:                
29:return
Algorithm 3 Guided rule induction. Function Grow operate as in the fully automatic algorithm, but it excludes attributes already present in , forbidden attributes, and conditions intersecting with forbidden conditions.

The guided separate-and-conquer heuristic was presented in Algorithm 3. It starts from processing initial rules in the order specified by the user. If flag is enabled, an attempt is made to extend an initial rule by at most preferred conditions and preferred attributes (lines 4–5). After that, if flag is enabled, the algorithm adds automatically induced conditions using standard separate-and-conquer strategy (lines 6–9). When all initial rules have been processed, new ones are generated analogously; the corresponding boolean parameters are called and (lines 11–20).

For regression and survival problems, the described procedure is performed once, similarly as in the fully automatic mode. For classification tasks, the algorithm is executed for each class the knowledge has been specified for. If considerOtherClasses parameter is set, this is followed by the fully automatic induction of rules for classes without user‘s preferences.

An important assumption concerning the semi-automatic induction is that knowledge elements are prioritized, i.e.:

  • Initial rules and preferred conditions/attributes are more important than forbidden conditions/attributes. Therefore, if an initial rule contains condition with attribute , it will appear regardless of being marked as forbidden () or intersecting one of the forbidden conditions (). The same holds for preferred conditions and attributes—forbidden knowledge applies to automatic induction only.

  • Requirement of higher priority cannot be altered by that of lower priority. For instance, if an initial rule contains condition with attribute , cannot be modified, i.e., no other condition concerning can be added to this rule, neither preferred ( or -based), nor automatically induced. Similarly, preferred conditions cannot be overridden by -based/automatic conditions, etc.

  • Requirements of the same category are prioritized by order in which they are specified by the user.

  • User-defined knowledge cannot be subject to pruning.

1:—input rule, —training dataset, —set of uncovered examples,
2:/—multiset of preferred conditions/attributes,
3:/—maximum number of preferred conditions/attributes per rule,
4:mincov—minimum number of yet uncovered examples that a new rule has to cover.
5:—grown rule.
6:function GuidedGrow()
7:      exclude attributes already present in the rule
8:      initialize counters with 0
9:     repeat analyze preferred conditions
10:           current best condition
11:          for  do analyze all preferred conditions
12:               if Attr() and and       is better than  then
13:                                              
14:           add condition to the rule premise
15:           remove preferred condition
16:           remove used attribute
17:          
18:     until  or
19:     repeat analyze preferred attributes
20:           current best condition
21:          for  do
22:               
23:               if  and       is better than  then
24:                                              
25:           add condition to the rule premise
26:           remove preferred attribute
27:           remove used attribute
28:          
29:     until  or
30:     return
Algorithm 4 Growing a rule using preferred conditions and attributes.

The prioritization determines how a single rule is grown taking into account user‘s preferences (Algorithm 4). At the beginning, attributes already present in the rule are excluded (line 2). Then, at most preferred conditions fulfilling coverage requirement are added to the rule (lines 4–13). At each step, a condition rendering the rule of highest quality is selected (lines 6–8). After that, preferred attributes are processed similarly (lines 14–24). For each preferred attribute, a condition leading to the rule of highest quality is considered (line 17: InduceBestCondition function call). When a preferred condition/attribute is used, its multiplicity in / multiset is decreased (lines 10, 21). Moreover, already employed attributes cannot be used again in the rule (lines 11, 22).

4 Results

The algorithm was evaluated on three test cases representing classification, regression, and survival problems. The analysis of each dataset concerned:

  1. the validation of models rendered by automatic and user-guided rule induction; depending on the problem, this was done by 10-fold cross validation or train/test split,

  2. the analysis of rule sets induced on the entire datasets in the context of domain knowledge.

Table 1 presents problem-specific details of experimental procedures, e.g., model validation methods, quality criteria, statistical tests used for determining rules significance, etc.

The rule set descriptive statistics were common for all investigated datasets and consisted of: number of rules (#rules), average number of conditions per rule (#conditions), average rule precision (

) and support (). Note, that the interpretation of indicators based on confusion matrix varies for different problems. Particularly, for classification and are fixed for each analyzed class, for regression and are determined for each rule on the basis of covered examples, for survival analysis all examples are considered positive, thus and equal to 0.

For all investigated models we report fraction of statistically significant rules at level (%significant). To control false discovery rate (FDR) in multiple testing, Benjamini-Hochberg correction was applied benjamini1995 .

Problem
classification regression survival
Dataset
seismic-bumps methane BMT-Ch
Model validation method
10-fold CV train/test split 10-fold CV
Quality criteria

sensitivity, specificity, and their geometric mean:

, ,

root relative

squared error

—observed label, —expected, —average

integrated Brier score

(see wrobel2017 for details)

Quality difference significance test
Student‘s -test Student‘s -test
Rule significance test

Fisher‘s exact test for comparing confusion matrices

test for comparing label variance of covered vs. uncovered examples

log-rank for comparing survival functions of covered vs. uncovered examples

Table 1: Experimental setting for investigated problems.

Another analysis step was the comparison of similarity between guided-guided and automatic rule sets. The similarity between two rule sets and on the dataset is expressed as:

(7)
  • is the number of pairs of examples in for which there exists some rule in and some rule in covering both and .

  • is the number of pairs of examples in for which there neither exists rule in nor in covering both and .

  • is the number of all pairs of examples in .

The measure might be interpreted as the probability of agreement between two rule sets for randomly chosen pair of examples. The agreement between rule sets and for pair of examples means that:

  • if both examples satisfy premise of some rule in then they also both satisfy premise of some rule in ,

  • if examples are not covered by common rule in then they also are not covered by common rule in ,

  • if both examples are not covered by any of rules in then they also are not covered by any of rules in ,

  • if one of examples is covered by some rule in and the other one is not covered by any of rules in then the same applies to .

The rule sets similarity measure takes values between 0 and 1. The value 0 indicates that the two rule sets do not agree of any pairs of examples from given dataset . The value 1 means the perfect agreement, i.e. that there not exists any pair of examples which are covered by common rule in one of the rule sets and not covered by common rule in the second one. Since proposed measure evaluates the similarity between subsets of examples covered by rule sets, it is not influenced by rules overlap within a set. In particular, if rule sets and have similarity score equal 1, then extending these rule sets by additional rules does not change the value of the score.

The proposed similarity score can be also considered as a variant of Rand measure rand1971objective used for evaluation of clustering performance. However, our proposal takes into account that single example can satisfy premises of several rules as well as that it may be not covered by any of the rules.

The following subsections contain detailed analysis of classification, regression, and survival experiments.

4.1 Classification

Classification experiments were performed on seismic-bumps dataset from UCI Machine Learning Repository uci . The dataset had been prepared and made available by the authors of the paper and concerns a problem of forecasting high energy seismic bumps in coal mines kabiesz . It contains 2 584 instances (170 positives and 2 414 negatives) and 19 attributes characterizing seismic activity in the rock mass within one 8-hour shift (see Table 2 for description of crucial features). The value 1 of class attribute indicates the presence of a seismic bump with energy higher than J in the next shift.

attribute description
seismic
(seismoacoustic)
result of shift seismic (seismoacoustic) hazard assessment in the mine working obtained by the seismic (seismoacoustic) method developed by mine experts (a—lack of hazard, b—low hazard, c—high hazard, d—danger state)
genergy seismic energy recorded within the previous shift by the most active geophone (GMax) out of geophones monitoring the longwall
gimpuls a number of pulses recorded within the previous shift by GMax
goenergy a deviation of energy recorded within the previous shift by GMax from the average energy recorded during eight previous shifts
goimpuls a deviation of a number of GMax pulses within the previous shift from the average number of pulses within eight previous shifts
ghazard result of shift seismic hazard assessment in the mine working obtained by the seismoacoustic method based on registration coming from GMax only
nbumpsX a number of seismic bumps in the energy range (where ), registered within the previous shift
senergy total energy of seismic bumps registered within the previous shift
maxenergy the maximum energy of the seismic bumps registered within the previous shift
Table 2: Description of selected attributes of seismic-bumps dataset.

The model validation was carried out according to the stratified 10-fold cross validation. To establish algorithm parameters, automatic rule induction was performed as an initial step. Due to strong imbalance of the problem, a geometric mean (Gm) of sensitivity and specificity was used for assessment. Among examined quality measures (C2, Correlation, Conditional entropy, Lift, RSS, SBayesian), Conditional entropy with was selected for further investigation.

To demonstrate the flexibility of our algorithm, a guided rule induction was done in several variants, with different algorithm parameters. The variants marked as guided-c1, guided-c2, guided-c3, guided-c4 are attempts to use in the classifier attributes that, according to the domain knowledge, should have the greatest significance for bumps forecasting kabiesz . The guided-c5 and guided-c6 variants are an attempt to define the classifier only on the basis of data coming from one measurement system: in the former it is forbidden to use attributes containing data from geophones (i.e., a seismoacoustic system), in the latter it is forbidden to use attributes containing data from seismometers (i.e., a seismic system).

The variants together with corresponding algorithm parameters are listed below. Class-specific requirements are defined with superscripts, e.g., contains preferred attributes for class 0 (lack of superscript indicates that knowledge applies to both classes). Only important parameters are specified.

guided-c1:

Model consists of two initial rules:
.

guided-c2:

Attribute gimpuls is used in rules for both classes at least once:
.

guided-c3:

At least of rules contain gimpuls, genergy, and senergy attributes together:
.

guided-c4:

At least one of seismic, seismoacoustic, and ghazard attributes is used in each rule, with an additional requirement on value sets—class 0 may use values a, b, class 1 may use values b, c, d:
.

guided-c5:

Attributes gimpuls, goimpuls, ghazard, and seismoacoustic are forbidden:
.

guided-c6:

Attributes from nbumps family as well as senergy, maxenergy, and seismic are forbidden: analogous to guided-c5.

Validation Descriptive statistics
(10-CV) (full dataset)
variant SE SP Gm Gm- angle=45,lap=0pt-(1em)#rules angle=45,lap=0pt-(1em)#conditions angle=45,lap=0pt-(1em)support angle=45,lap=0pt-(1em)precision angle=45,lap=0pt-(1em)%significant angle=45,lap=0pt-(1em)similarity
auto 0.67 0.76 0.708 0.071 ⅇⅇ— ⅇ67 7.2 0.14 0.51 ⅇ94
guided-c1 0.49 0.813 0.627 0.064 ⅇⅇ0.01 ⅇⅇ2 1.0 0.50 0.55 100 0.74
guided-c2 0.62 0.82 0.711 0.062 ⅇⅇ0.90 ⅇ39 5.0 0.30 0.57 ⅇ95 0.92
guided-c3 0.58 0.82 0.690 0.046 ⅇⅇ0.42 155 5.0 0.38 0.78 ⅇ97 0.88
guided-c4 0.42 0.81 0.580 0.057 121 5.9 0.31 0.80 ⅇ99 0.51
guided-c5 0.64 0.78 0.701 0.082 ⅇⅇ0.65 ⅇ43 4.1 0.30 0.48 ⅇ93 0.88
guided-c6 0.56 0.73 0.622 0.070 ⅇⅇ0.02 ⅇ55 4.9 0.20 0.49 ⅇ96 0.88
Table 3: The analysis of the classification rule sets in terms of model quality (SE—sensitivity, SP—specificity, Gm—their geometric mean, Gm-

—Student‘s t-test

-value comparing Gm of user‘s variants w.r.t. auto) and descriptive statistics.

The summary of results for automatic and guided classification variants are given in Table 3. Below there is also an analysis of the rule sets obtained by means of the automatic, guided-c1, guided-c2, guided-c4, and guided-c6 methods on the entire dataset.

The rule set induced automatically consisted of 67 rules with average support and precision equal to 0.14 and 0.51, respectively (taking into account dataset imbalance, it is an acceptible result). The attributes goimplus, gimplus, ghazard, and seismoacoustic occured in 49, 47, 18, and 13 rules, respectively.

Below we present –- for each decision class –- the strongest rule generated automatically. In brackets there are confusion matrix elements which allow calculating support and precision. The rules indicating decision class 1 were more specific, less precise, and covered less examples.


(, , , )

(, , , )

The characteristics of rules obtained in the guided-c1 experiment are as follows:

(, , , )

(, , , )

The classifier based on these two rules had a significantly worse classification quality. However, it is worth noticing that the first rule was less precise by only 3.3% than the best rule generated automatically for this class, however, its support was 3.6 times larger. The rule pointing at class 1 had the precision of 0.15 which was over twice as much as the 0.065 a priori precision of this class.

The guided-c2 experiment aimed at forcing the occurrence of the gimpuls attribute in each rule as well as adding other elementary conditions to the rule premise. As it can be seen in Table 3, this leaded to a model with the best classification ability. In addition, the number of rules decreased, compared to the automatically generated rule set, while their average support and average precision increased over 214% and 11%, respectively.

The results achieved for the guided-c6 experiment show that it is impossible to obtain a good quality classifier only on the basis of data coming from a seismic system, thus it is indispensable to use geophones (sensors which register seismoacoustic emission).

In all cases a majority of induced rules were statistically significant. Rule sets generated under the guided mode (particularly guided-c2 and guided-c3 variants) were less numerous than those generated automatically. They also contained fewer elementary conditions. According to the value of similarity measure rule sets induced in auto, guided-c2 and guided-3 mode were very similar. However, rules generated in the guided mode represented knowledge which is more in compliance with the user’s requirements and intuition. Additionally, the analysis of standard deviations shows that rule sets generated in the guided mode were more stable in their classification abilities.

The experiments we carried out show that the guided (interactive) model definition allows verifying certain research hypotheses and, in particular, obtaining classifiers superior to those generated automatically. The induction of successive rule sets may contribute to further analyses. For example, one could attempt to develop a classifier made of the first rule from guided-c1 model supplemented with automatic rules. Our software enables performing many variants of such analyses.

4.2 Regression

The usability of the presented algorithm for regression problems was verified on methane dataset, which concerns the problem of predicting methane concentration in a coal mine. The set contains 13 368 train and 5 728 test instances characterized by 7 attributes. The features indicate methane concentration (MM116, MM31 [%]), air velocity (AS038 [m/s]), airflow (PG072 [m/s]), atmospheric pressure (BA13 [hPa]), and whether the coal production process () is carried out. The location of sensors is depicted in Fig. 1. The attributes represents averaged measurements from 30-second periods. The task is to predict the maximal value of methane concentration registered by MM116 for next 3 minutes.

Figure 1: Sensor location in the longwall area.

As in the previous case, an automatic induction was done in order to adjust parameters. Eventually, RSS quality measure with was selected as providing the best trade off between root relative squared error (RRSE) and model complexity expressed by the number of rules and conditions. The following variants of user‘s knowledge were investigated:

guided-r1:

The model contains and conditions, both appearing in three rules:
.

guided-r2:

The conjunction appears in five rules:
.

guided-r3:

The conjunction appears in five rules: analogous to guided-r2.

guided-r4:

Attributes DMM116, MM116, and PD appear in every rule:
.

Validation Descriptive statistics
(train/test) (entire dataset)
variant RRSE angle=45,lap=0pt-(1em)#rules angle=45,lap=0pt-(1em)#conditions angle=45,lap=0pt-(1em)support angle=45,lap=0pt-(1em)precision angle=45,lap=0pt-(1em)%significant angle=45,lap=0pt-(1em)similarity
auto 0.918 ⅇ9 3.5 0.26 0.64 ⅇ88
guided-r1 0.811 19 4.4 0.17 0.66 ⅇ95 0.70
guided-r2 0.793 11 3.3 0.18 0.69 100 0.93
guided-r3 0.863 ⅇ8 2.9 0.18 0.78 ⅇ87 0.93
guided-r4 1.174 41 5.5 0.10 0.70 100 0.60
Table 4: The analysis of the regression rule sets in terms of model quality (RRSE—relative root squared error) and descriptive statistics.

The automatic induction produced 9 rules, which allowed achieving RRSE of 0.918, i.e. smaller than the naive prognosis based on the average value of the dependent variable (see Table 4 for all the results). The MM116 and MM31 attributes dominated in the rule premises. This means that the currently registered concentration of methane has the largest impact on the future concentration. This is illustrated, for example, by the following rule:

which shows that if the concentration of methane in the middle of the longwall is low, the predicted concentration at the longwall exit will be about twice as high (it will remain in the range [0.39, 0.41]).

Another rule presents an interesting dependence. If the methane concentration is on an average level (about 1%), too high air velocity can lead to the eddies of the gas mixture at the longwall exit and, at the same time, can increase the methane concentration (methane in the range [0.92, 1.28])

In automatically generated rules, the PD attribute occured only twice. Within the guided-r1 experiment the use of elementary conditions or (the cutter-loader is not working) was obligatory. This reflects the hypothesis supported by domain knowledge that the emission of methane is larger while the cutter-loader is working. In this way, a significant error reduction was achieved at the cost of increasing the number of rules and conditions occurring in their premises.

Within the guided-r3 experiment, the occurrence of the and conditions at the same time was obligatory. A rule containing only those conditions covered 14% of all examples and looks as follows:

The induction of the above rule allowed better identification of rules indicating higher methane concentration, e.g.:

and, as a result of that, caused further decrease of RRSE.

Apart from the last case (guided-r4), rule sets induced in guided mode produced a smaller RRSE values than an automatically generated set. The guided-r1 settings enforce the use of PD = 0 and PD = 1 conditions in three rules. This algorithm settings reflect an attempt to make the methane level dependent on the coal production process. The regression errors of guided-1 and automatically generated were close to each other, while the value of the similarity measure was relatively low. This means that both of these rule sets generated different coverages of the example space.

Generally, in the case of regression rule induction, the definition of the user’s requirement and the analysis of the rules can be difficult because there are no explicitly defined decision classes here. However, as we can see, an interactive analysis allows reducing estimation error. In addition, it is possible to identify interesting regularities in the data, similar to the negative effects of too high air velocity.

4.3 Survival analysis

Another area our algorithm can be applied is survival analysis. The corresponding experiments were performed on BMT-Ch dataset, which describes 187 pediatric patients (75 females and 112 males) with several hematologic diseases: 155 malignant disorders (i.a. 67 patients with acute lymphoblastic leukemia, 33 with acute myelogenous leukemia, 25 with chronic myelogenous leukemia, 18 with myelodysplastic syndrome) and 32 nonmalignant cases (i.a. 13 patients with severe aplastic anemia, 5 with Fanconi anemia, 4 with X-linked adrenoleukodystrophy). All patients were subject to the unmanipulated allogeneic unrelated donor hematopoietic stem cell transplantation. Instances are described by 37 conditional attributes, the meaning of the selected ones is as follows: relapse—reoccurrence of the disease, PLTRecovery—time to platelet recovery defined as platelet count , ANCRecovery—time to neutrophils recovery defined as neutrophils count , aGvHD_III_IV—development of acute graft versus host disease stage III or IV, extcGvHD—development of extensive chronic graft versus host disease, CD34 CD34+ cell dose per kg of recipient body weight, CD3 CD3+ cell dose per kg of recipient body weight. Patient‘s death is considered as an event in the survival analysis.

The remaining attributes concern coexisting diseases/infections (e.g. cytomegalic inclusion disease) and describe matching between the bone marrow donor and recipient.

The experiments were performed for with different initial knowledge variants (note, that in the survival analysis class labels for initial rules cannot be specified):

guided-s1:

Every rule contains CD34 and does not contain ANCRecovery and PLTRecovery attributes:
.

guided-s2:

The model consists of four expert rules:
.

guided-s3:

Similarly as in the previous case, but CD34 ranges may be altered and rules can be extended with automatic conditions:
.

guided-s4:

The model consists of two initial rules:
.

Validation Descriptive statistics
(10-CV) (entire dataset)
variant IBS IBS- angle=45,lap=0pt-(1em)#rules angle=45,lap=0pt-(1em)#conditions angle=45,lap=0pt-(1em)support angle=45,lap=0pt-(1em)precision angle=45,lap=0pt-(1em)%significant angle=45,lap=0pt-(1em)similarity
auto 0.212 0.048 ⅇ4 3.0 0.49 1.00 100
guided-s1 0.235 0.069 0.31 14 4.1 0.14 1.00 ⅇ71 0.30
guided-s2 0.221 0.033 0.48 ⅇ4 2.0 0.21 1.00 ⅇ50 0.27
guided-s3 0.225 0.036 0.43 ⅇ4 3.0 0.36 1.00 100 0.49
guided-s4 0.223 0.026 0.38 ⅇ2 2.0 0.50 1.00 100 0.48
Table 5: The analysis of the survival rule sets in terms of model quality (IBS—integrated Brier score, IBS-—Student‘s t-test -value comparing IBS of user variants w.r.t. auto) and descriptive statistics.

Detailed results can be found in Table 5. The automatic method generated four rules in which the survival function depended on such factors as the patient age, donor age, gender match, disease relapse, and the number of days to achieve the PLTRecovery. The variants of the guided rule induction refer to the verification of the research hypothesis that an increased dosage of CD34+ cells/kg extends overall survival time without simultaneous occurrence of undesirable events affecting the patients‘ quality of life.

The guided-s2 experiment was based on an arbitrary definition of rules. These rules try to make the survival function dependent on the CD34 dosage and the occurence of the extensive chronic graft versus host disease. The average -value of the rules with FDR correction was 0.229, which shows that the rules considered separately did not contain statistically useful information. As it can be observed, the IBS value was also worse. Better results were achieved for the rules containing only the CD34 attribute (guided-s4). They were characterized by the average -value of 0.019 after correction.

In the guided-s3 experiment the use of the CD34 and extcGvHD attributes was obligatory, but in the former case, the division point on the attribute domain was not defined. It was also admissible to add other attributes to the rule premises. The algorithm generated four rules with the average -value after correction equaled to 0.12.

Figure 2 shows survival curves corresponding to the three rules presented below. Survival curve for the entire set of examples is also given.

Figure 2: Survival curves of observations covered by three selected rules and the entire set of examples (default).
s-R1

s-R2

s-R3

There was practically no difference between survival curves corresponding to the second and the third rule. One can see that the second rule was more specific then the third one. The CD34 dosage does not have any impact on the survival function if the patient has not extensive chronic graft versus host disease.

According to the medical knowledge, chronic graft versus host disease remains dangerous complication of allogeneic stem cell transplantation. However, mild forms of this disease are often manageable and if the disease is under control, extend the overall survival time as it causes the elimination of cancer cells (blasts) remaining in the blood. On the other hand, the first rule shows that the patients with small doses of CD34 who developed extensive chronic graft versus host disease have significantly shorter survival time in spite of fast neutrophils recovery.

As it was mentioned before, in the case of survival rule induction, there are no negative examples, therefore the precision of all rules was equal to 1. Similarly to the classification rules, the majority of survival rule sets generated by the guided induction were more stable (smaller standard deviation) than rule sets generated in automatic mode. Statistically, there were no differences between IBS values of guided and automatically generated sets of rules. The similarity measure values of rule sets were very low. This demonstrates that it is possible to define a rule set compliant with the user’s needs, which is different from the automatically generated model, but preserves its prediction abilities.

The next step in the doctor‘s analysis could be a deeper investigation of the induced rules. For example, our algorithm could be used for further analysis of the s-R1 rule. One can remove the conditions that may be too specific according to the medical knowledge (e.g. ) and analyze the quality of that modified rule—separately or together with another rules induced in an automatic way.

The presented example shows that the visualization of rule conclusions is very helpful in the survival analysis. Furthermore, similarly to the previous cases, an interactive analysis of data and induced rules rendered interesting results. The models showed better compliance with the user’s (e.g. doctor’s) requirements than those achieved by means of an automatic method.

5 Conclusions

The article presents a rule induction algorithm in which the learning process can be guided by the user (domain expert). GuideR can be used in classification, regression, and survival settings in an interactive way, enabling the user to adjust final rule set to his own preferences. The rule induction algorithms are known to be unstable, as a small change in the set of the training examples may cause significant changes in the resulting rule set. The underlying cause is often ralated to the boundary areas of elementary conditions covering only small number of examples. A user-guided definition of those ranges usually results in preserving predictive abilities of the final rule set, making it more stable, clearer, and closer to the user’s intuition at the same time. For example, in the analysed case studies, the survival rule s-R1 contained a condition with a range 13.055—limiting this range to 13 makes the rule more intuitive with insignificant decrease in the quality.

GuideR can impact the attributes, elementary conditions, or even rules of which the rule sets are composed of, directing the induction towards models most interesting to the user. Thus, the algorithm can be considered as a tool for knowledge discovery and for testing certain hypotheses concerning dependencies which are expected to occur in the data. In particular, the algorithm is able to find modifications of user-defined hypotheses, provided in the form of rules, to improve their quality. Certainly, an automatic rule induction can be the starting point of a thorough dependency analysis. A set of automatically induced rules—or selected rules from this set—can be the basis for further, interactive experiments. Moreover, the guided induction can be an iterative process, i.e., the successive rule sets may be built on the basis of the insights from the previous iterations.

The efficiency of our algorithms for automatic rule induction has been confirmed on dozens of benchmark datasets wrobel2017 ; wrobel2016 ; sikora2012 ; sikora2013data . In the experimental part of this article we focused on showing the efficiency and benefits coming from the use of the guided version of the algorithm. For this purpose, the analysis of three real-life datasets was presented. It show that the guided rule induction may produce data models of similar generalization abilities (e.g., classification accuracy) as the automatic induction, containing attributes, elementary conditions, and rules complying with the user’s requirements.

Further work will concern two directions. The first one is extending the algorithm with the possibility to induce so-called action rules [] and interventions. Action rules and interventions specify recommendations which should be taken in order to transfer objects from the undesirable concept to the desirable one (e.g., moving a client from the churn group to the group of regular customers). The second direction will be focused on the development of a graphical user interface for GuideR to make it easier to apply in the real-life analyses.

References

References

  • (1) J. Błaszczyński, R. Słowiński, M. Szelag, Sequential covering rule induction algorithm for variable consistency rough set approaches, Inform. Sciences 181 (5) (2011) 987–1002.
  • (2) J. Fürnkranz, Separate-and-conquer rule learning, Artif. Intell. Rev. 13 (1) (1999) 3–54.
  • (3) J. W. Grzymala-Busse, W. Ziarko, Data Mining: Opportunities and Challenges, in: J. Wang (Ed.), Data Mining Based on Rough Sets, IGI Global, Hershey, USA, 2003, pp. 142–173.
  • (4) K. A. Kaufman, R. S. Michalski, Learning in an Inconsistent World: Rule Selection in STAR/AQ18, Machine Learning and Inference Laboratory, Fairfax, USA, 1999.
  • (5) H. Liu, M. Cocea, Induction of Classification Rules by Gini-Index Based Rule Generation, Inform. Sciences 436 (2018) 227–246.
  • (6) A. Valmarska, N. Lavrac, J. Fürnkranz, M. Robnik-Sikonja, Refinement and selection heuristics in subgroup discovery and classification rule learning, Expert Syst. Appl. 81 (2017) 147–162.
  • (7) B. E. Boser, I. M. Guyon, V. N. Vapnik, A training algorithm for optimal margin classifiers, in: COLT 1992, ACM, New York, 1992, pp. 144–152.
  • (8) E. Czogala, J. Leski, Fuzzy and neuro-fuzzy intelligent systems, Vol. 47 of Studies in Fuzziness and Soft Computing, Physica-Verlag, Heidelberg, 2000.
  • (9) L. Rokach, Ensemble-based classifiers, Artif. Intell. Rev. 33 (1) (2010) 1–39.
  • (10) K. Simiński, Rough subspace neuro-fuzzy system, Fuzzy Sets Syst. 269 (2015) 30–46.
  • (11) K. Dembczyński, W. Kotłowski, R. Słowiński, ENDER: a statistical framework for boosting decision rules, Data Min. Knowl. Discov. 21 (1) (2010) 52–90.
  • (12) R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: VLDB 1994, Vol. 1215, Morgan Kaufmann Publishers, San Francisco, 1994, pp. 487–499.
  • (13) B. Kavšek, N. Lavrač, APRIORI-SD: Adapting association rule learning to subgroup discovery, Appl. Artif. Intell. 20 (7) (2006) 543–583.
  • (14) J. Stefanowski, D. Vanderpooten, Induction of decision rules in classification and discovery-oriented perspectives, Int. J. Intell. Syst. 16 (1) (2001) 13–27.
  • (15) N. Lavrač, B. Kavšek, P. Flach, L. Todorovski, Subgroup discovery with CN2-SD, J. Mach. Learn. Res. 5 (Feb) (2004) 153–188.
  • (16) L. Geng, H. J. Hamilton, Interestingness measures for data mining: A survey, ACM Comput. Surv. 38 (3) (2006) 9.
  • (17) S. Greco, R. Słowiński, I. Szczech, Measures of rule interestingness in various perspectives of confirmation, Inform. Sciences 346 (2016) 216–235.
  • (18) R. J. Bayardo Jr, R. Agrawal, Mining the most interesting rules, in: KDD 1999, ACM, New York, 1999, pp. 145–154.
  • (19) Ł. Wróbel, A. Gudyś, M. Sikora, Learning rule sets from survival data, BMC Bioinformatics 18 (1) (2017) 285.
  • (20) Ł. Wróbel, M. Sikora, M. Michalak, Rule Quality Measures Settings in Classification, Regression and Survival Rule Induction—an Empirical Approach, Fundam. Inform. 149 (4) (2016) 419–449.
  • (21)

    M. Sikora, A. Skowron, Ł. Wróbel, Rule quality measure-based induction of unordered sets of regression rules, in: A. Ramsay, G. Agre (Eds.), Artificial Intelligence: Methodology, Systems, and Applications, Vol. 7557 of LNAI, Springer, Berlin Heidelberg, 2012, pp. 162–171.

  • (22) M. Sikora, Ł. Wróbel, Data-driven adaptive selection of rule quality measures for improving rule induction and filtration algorithms, Int. J. Gen. Syst. 42 (6) (2013) 594–613.
  • (23) M. Sikora, Ł. Wróbel, Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines, Arch. Min. Sci. 55 (1) (2010) 91–114.
  • (24) Marek Sikora and Łukasz Wróbel and Adam Gudyś, Methane dataset, https://github.com/adaa-polsl/GuideR (2018).
  • (25) K. Kałwak, J. Porwolik, M. Mielcarek, E. Gorczyńska, J. Owoc-Lempach, M. Ussowicz, Dyla, et al., Higher CD34+ and CD3+ Cell Doses in the Graft Promote Long-Term Survival, and Have No Impact on the Incidence of Severe Acute or Chronic Graft-versus-Host Disease after In Vivo T Cell-Depleted Unrelated Donor Hematopoietic Stem Cell Transplantation in Children, Biol. Blood Marrow Transplant. 16 (10) (2010) 1388–1401.
  • (26) M. Sikora, M. Mielcarek, K. Kałwak, Application of rule induction to discover survival factors of patients after bone marrow transplantation, Journal of Medical Informatics & Technologies 22 (2013) 35–53.
  • (27) P. Clark, T. Niblett, The CN2 induction algorithm, Mach. Learn. 3 (4) (1989) 261–283.
  • (28)

    M. Sikora, A. Gruca, Induction and selection of the most interesting gene ontology based multiattribute rules for descriptions of gene groups, Pattern Recognit. Lett. 32 (2) (2011) 258–269.

  • (29) M. J. Moshkov, M. Piliszczuk, B. Zielosko, Partial covers, reducts and decision rules in rough sets: theory and applications, Vol. 145, Springer, 2008.
  • (30) S. Tsumoto, Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model, Inform. Sciences 162 (2) (2004) 65–80.
  • (31) K. Napierala, J. Stefanowski, BRACID: a comprehensive approach to learning rules from imbalanced data, J. Intell. Inf. Syst. 39 (2) (2012) 335–373.
  • (32) J. Hühn, E. Hüllermeier, FURIA: an algorithm for unordered fuzzy rule induction, Data Min. Knowl. Discov. 19 (3) (2009) 293–319.
  • (33) M. Možina, J. Žabkar, I. Bratko, Argument based machine learning, Artif. Intell. 171 (10–15) (2007) 922–937.
  • (34) L. S. Riza, A. Janusz, C. Bergmeir, C. Cornelis, F. Herrera, D. Ślezak, J. M. Benítez, Implementing algorithms of rough set theory and fuzzy rough set theory in the R package ’’roughsets‘‘, Inform. Sciences 287 (2014) 68–89.
  • (35) M. Sikora, Redefinition of decision rules based on the importance of elementary conditions evaluation, Fund. Inform. 123 (2) (2013) 171–197.
  • (36) L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, Chapman & Hall/CRC, Boca Raton, USA, 1984.
  • (37) J. R. Quinlan, Learning with continuous classes, in: AI‘92, World Scientific, Singapore, 1992, pp. 343–348.
  • (38) B. Ženko, S. Džeroski, J. Struyf, Learning predictive clustering rules, in: F. Bonchi, J.-F. Boulicaut (Eds.), Knowledge Discovery in Inductive Databases, Vol. 3933 of LNCS, Springer, Berlin Heidelberg, 2005, pp. 234–250.
  • (39) F. Janssen, J. Fürnkranz, Heuristic rule-based regression via dynamic reduction to classification, in: IJCAI‘11, AAAI Press, 2011, pp. 1330–1335.
  • (40) J. H. Friedman, B. E. Popescu, Predictive learning via rule ensembles, Ann. Appl. Stat. (2008) 916–954.
  • (41) K. Dembczyński, W. Kotłowski, R. Słowiński, Solving Regression by Learning an Ensemble of Decision Rules, in: L. Rutkowski, et al. (Eds.), Artificial Intelligence and Soft Computing—ICAISC 2008, Vol. 5097 of LNCS, Springer, Berlin Heidelberg, 2008, pp. 533–544.
  • (42) P. Pattaraintakorn, N. Cercone, A foundation of rough sets theoretical and computational hybrid intelligent system for survival analysis, Comput. Math. Appl. 56 (7) (2008) 1699–1708.
  • (43) J. Bazan, A. Osmólski, A. Skowron, D. Ślezak, M. Szczuka, J. Wróblewski, Rough Set Approach to the Survival Analysis, in: J. J. Alpigini, et al. (Eds.), Rough Sets and Current Trends in Computing, Vol. 2475 of LNCS, Springer, Berlin Heidelberg, 2002, pp. 522–529.
  • (44) M. Sikora, L. Wróbel, M. Mielcarek, K. Kawłak, Application of rule induction to discover survival factors of patients after bone marrow transplantation, Journal of Medical Informatics & Technologies 22 (2013) 35–53.
  • (45) L.-P. Kronek, A. Reddy, Logical analysis of survival data: prognostic survival models by detecting high-degree interactions in right-censored data, Bioinformatics 24 (16) (2008) i248–i253.
  • (46) Y. Crama, P. Hammer, T. Ibaraki, Cause-effect relationships and partially defined Boolean functions, Ann. Oper. Res. 16 (1) (1988) 299–325.
  • (47) X. Liu, V. Minin, Y. Huang, D. B. Seligson, S. Horvath, Statistical methods for analyzing tissue microarray data, J. Biopharm. Stat. 14 (3) (2004) 671–685.
  • (48) M. LeBlanc, J. Crowley, Relative risk trees for censored survival data, Biometrics 48 (2) (1992) 411–425.
  • (49) T. M. Therneau, P. M. Grambsch, T. R. Fleming, Martingale-based residuals for survival models, Biometrika 77 (1) (1990) 147–160.
  • (50) M. LeBlanc, J. Crowley, Survival trees by goodness of split, J. Amer. Statist. Assoc. 88 (422) (1993) 457–467.
  • (51) A. Rafea, S. Shafik, S. Khaled, An Interactive System for Association Rule Discovery for Life Assurance, in: ICCCT 2004, Texas, USA, 2004, pp. 32–27.
  • (52) T. Kliegr, J. Kuchař, S. Vojíř, V. Zeman, EasyMiner-Short History of Research and Current Development, in: ITAT 2017, 2017, pp. 235–239.
  • (53) B. Padmanabhan, A. Tuzhilin, A Belief-Driven Method for Discovering Unexpected Patterns, in: KDD‘98, AAAI Press, 1998, pp. 94–100.
  • (54) D. Gamberger, N. Lavrac, Expert-guided Subgroup Discovery: Methodology and Application, J. Artif. Intell. Res. 17 (1) (2002) 501–527.
  • (55) G. Adomavicius, A. Tuzhilin, Expert-driven validation of rule-based user models in personalization applications, Data Min. Knowl. Discov. 5 (1–2) (2001) 33–58.
  • (56) J. Blanchard, F. Guillet, H. Briand, Interactive visual exploration of association rules with rule-focusing methodology, Knowl. Inf. Syst. 13 (1) (2007) 43–75.
  • (57) S. Chen, B. Liu, Generating Classification Rules According to User‘s Existing Knowledge, in: SIAM SDM‘01, 2001, pp. 1–15.
  • (58) IBM, IBM SPSS Modeler 18.0 Modeling Nodes, ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/18.0/en/ModelerUsersGuide.pdf, Accessed: May 2018.
  • (59) E. L. Kaplan, P. Meier, Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc. 53 (282) (1958) 457–481.
  • (60) R. S. Michalski, Discovering classification rules using variable-valued logic system VL1, in: IJCAI‘73, Morgan Kaufmann Publishers, San Francisco, 1973.
  • (61) I. Bruha, Quality of decision rules: Definitions and classification schemes for multiple rules, in: Machine Learning and Statistics, The Interface, John Wiley and Sons, 1997, pp. 107–131.
  • (62) A. An, N. Cercone, Rule quality measures for rule induction systems: Description and evaluation, Comput. Intell. 17 (3) (2001) 409–424.
  • (63) Y. Yao, N. Zhong, An analysis of quantitative measures associated with rules, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 1999, pp. 479–488.
  • (64) J. Fürnkranz, P. A. Flach, ROC ‘n‘ Rule Learning—Towards a Better Understanding of Covering Algorithms, Mach. Learn. 58 (1) (2005) 39—-77.
  • (65) F. Janssen, J. Fürnkranz, On the quest for optimal rule learning heuristics, Mach. Learn. 78 (3) (2010) 343–379.
  • (66) B. Minnaert, D. Martens, M. De Backer, B. Baesens, To tune or not to tune: rule evaluation for metaheuristic-based sequential covering algorithms, Data Min. Knowl. Discov. 29 (1) (2015) 237–272.
  • (67) S. Greco, Z. Pawlak, R. Słowiński, Can Bayesian confirmation measures be useful for rough set decision rules?, Eng. Appl. Artif. Intell. 17 (4) (2004) 345–361.
  • (68) D. P. Harrington, T. R. Fleming, A class of rank test procedures for censored survival data, Biometrika 69 (3) (1982) 553–566.
  • (69) Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. Roy. Stat. Soc. B Met. (1995) 289–300.
  • (70) W. M. Rand, Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc. 66 (336) (1971) 846–850.
  • (71) M. Sikora, L. Wróbel, UCI Machine Learning Repository: seismic-bumps Data Set, https://archive.ics.uci.edu/ml/datasets/seismic-bumps (2010).
  • (72) J. Kabiesz, B. Sikora, M. Sikora, Ł. Wróbel, Application of rule-based models for seismic hazard prediction in coal mines., Acta Montanistica Slovaca 18 (4).