1 Introduction
Association rule mining (ARM) has emerged as a powerful and specialized tool to identify patterns in large datasets. It can be used in applications or business operations where instances of some spatiotemporal occurrence is represented in tabular format across a set of common attributes. An ARM study typically results in rules of the form A B, which would mean that, based on evidence from the data, the presence of attribute A is likely to indicate the presence of attribute B. There are two major challenges to an ARM implementation: (i) Candidate Generation: This involves the process of filtering all the possible combinations of items that satisfy a given condition for selection. Given the exponentially large possibilities of rules, this condition focuses on the use of frequency based thresholds to remove potentially uninteresting rules [1]. The second major challenge is (ii) Candidate Evaluation: This involves the use of an appropriate metric (interestingness measure) to evaluate all the different rules that can be defined from the selected item sets [11].
This research concerns itself with the latter challenge. Candidate evaluation can be challenging because there are different ways of describing interestingness of rules. A recent study [14] showed that even among objective measures, there exist more than 61 that are defined in literature. Also, the information derived from these different interestingness measures (IM) may not always be consistent [11].
The properties are typically defined using a contingency table (see Table
1), a simplified adaptation from [11]. Here, two states, present and absent, are defined for two variables, A (rows) and B (columns). The frequency counts and define the copresence and coabsence of A and B, respectively. While the term would represent the presence of A and absence of B, and the opposite.In this research, we posit that the popularly used set of 8 properties covered in [12] do not fully capture some important aspects of interestingness measures, and this motivates us to define a more relevant and new property based analysis of IMs. Specifically, our motivation is built on the observations of [14], who state that the empirical classification of measures based on how they rank rules has little to do with the property based classification. A deeper study on this mismatch leads us to believe that preexisting mathematical properties are only useful in specific environmental contexts. These observations lead us to devise simpler, more generic property definitions which can be applied to different environmental contexts and bear a stronger affiliation to rule ranking patterns exhibited by the measures on empirical datasets.
To this end, we create a property definition framework that defines properties based on the change in the IM per unit change in a frequency count . We broadly refer to this as Rate of Change Analysis (RCA)^{1}^{1}1While this term is used in stock market analysis, our use of this term in the data mining context is novel.. Specifically, we define two properties which look at the partial derivative of the measure at two different preexisting states of the frequency count. The first studies the rate of change behavior of IM when the frequency count is very large (asymptotic effect as the frequency count tends to ). We refer to this as UnitNull Asymptotic Invariance (UNAI). The second property is defined at the point when the frequency count is currently or is tending to . We refer to this as UnitNull Zero Rate (UNZR). This looks at the effect of increasing the frequency count on the IM when it is currently nonexistent in the data set. By defining properties based on how measures actually change at different contingency table configurations we explicitly link the rule ranking behavior with the mathematical property.
1.1 Intuition for the properties UNAI and UNZR
When UNAI is satisfied, we can say that the measure will not keep increasing or decreasing with the addition of one of the s while the others are kept constant, that is, the metric will asymptotically converge to a fixed value. A metric that fails this property will not converge to a constant value with continued addition of s. An example is Lift, which keeps increasing with addition of s and does not converge to a value.
UNZR is satisfied when we can say that the measure will increase when shown evidence of copresence or coabsence, if such evidence did not previously exist. Also, it should decrease when shown evidence that one item occurs when the other does not (case of counterexamples). Such a relationship could be weak, but at the very least, such metrics will not behave counter to expectation (like decreasing when shown evidence of copresence or coabsence) and will not stay completely invariant.
The major contributions of this research are listed as follows:

Introduction of a novel approach to classify interestingness measures and the development of two specific properties, namely UNAI and UNZR, using this approach

An analysis of the performance of these properties through the classification of various interestingness measures, as well as a comparison with other properties presented in [11]

Presenting empirical case studies that provide validation for the findings and also demonstrates the usefulness of the properties using realworld and synthetic data sets.
2 Related Work
A large number of objective IMs have emerged as a result of the application of ARM across different domains. It is also documented that not all measures are capable of capturing the strength of associations and in some cases provide conflicting information of the strength of patterns [11]. Given the abundance of measures and difficulty in choosing the appropriate IM, researchers have suggested various classification schemes (of the IMs) to help identify the appropriate measure for a given application [10], [11], [12], [4], [14], [8]. There are two different types of classification that exist in literature: classification based on the properties of IMs (e.g. [10], [11], [12], [4]) and classification based on empirical results of IMs on different datasets (e.g. [14]).
Research conducted by [10] formalized a framework consisting of three properties that an IM should satisfy, namely: the measure should take value 0 if the occurrences of itemsets are independent (P1); the measure should be monotonically increasing with the copresence of itemsets (P2); and the measure should be monotonically decreasing with the occurrences of either itemsets (P3).
[11] proposed the following 5 properties in addition to the 3 proposed by [10]
: symmetry under variable permutation (O1), row/column scaling invariance (O2), antisymmetry under row/column permutation (O3), inversion invariance (O4) and null invariance (O5). They conducted a comparative study, testing 21 different IMs against the resulting 8 properties. The authors further proposed that the optimal way of finding a suitable IM would be to let the user define a property vector indicating the properties that would be ideally required for the given application. This property vector would then be compared to the property vectors of the different objective measure to pick out the ideal interestingness measure for that particular case. For instance, the nullinvariance property is considered to be important for interestingness measures used in the context of small probability events in a large dataset
[15]. While there has been further work in introducing new properties (e.g., [6], [2], [3], [4], [5]), these have not been as commonly used or cited as the work of [10] and [12].There has been limited work on classification of IMs based on empirical results on different datasets. Research by [7] proposed the classification of 35 different interestingness measures based on their empirical performance on 2 different datasets by studying the correlation of the interestingness measures. These measures were classified using a graph based clustering approach to create high correlation and lowcorrelation graphs. The work of [14] performed a comprehensive classification of 61 different objective IMs on the based on empirical results on 110 different datasets. It suggested that there exist 21 clusters of measures which are distinct and each of these clusters were studied in detail.
3 Mathematical definitions for properties UNAI and UNZR
An interestingness measure (IM) can be represented as a function of the frequency counts (see Equation 1). RCA analysis seeks to assess the relative change in the interestingness measure per unit change of the frequency counts. This is essentially the first partial derivative of the interestingness measure with respect to the variables representing the counts, as shown in Equation 2. The set of formulas representing the first partial derivative of the interestingness measure with respect to each of the four state variables , , and represent the RCA analysis as shown in Equation 3.
(1) 
(2) 
(3) 
(4) 
(5) 
We use the RCA analysis to define two novel properties. The UnitNull Asymptotic Invariance (UNAI), and the UnitNull Zero Rate (UNZR). Mathematically, both these properties are the derivative at a point or the instantaneous rate of change, at two specific points. We can define the property UnitNull Asymptotic Invariance (UNAI) as the derivative of the interestingness measure (IM) with respect to as , and this instantaneous rate of change can be written as shown in Equation 4. UNAI can be defined for each of the four frequency count variables by substituting with the count of interest. Similar to UNAI, UNZR can be captured by looking at the instantaneous rate of change at . Formally, this would be the derivative of the interestingness measure (IM) with respect to as , and this instantaneous rate of change can be written as shown in Equation 5. To compute, UNAIs and UNZRs, in some cases we can simply take the first partial derivative and directly substitute the point of interest, in other scenarios we use the limit notation for derivative at a point (also shown in Equations 4 and 5). Having defined the framework for computing the satisfaction of UNAIs and UNZRs, in the subsequent sections we define the conditions where an interestingness measure can be said to satisfy these properties. These sections presents a classification scheme for the properties UNAI and UNZR which are presented at the individual level as well as the metric as a whole
3.1 UNAI property definition
We create a twopronged classification scheme for UNAI. We define which is defined for each frequency count . We do this explicitly for which can then be extended to the other frequency counts. We also consolidate the results across all s to present the property for the metric as a whole:

is satisfied when: , for all feasible combination of values of . We define a feasible combination of values as ones which enable the calculation of the metric in deterministic forms for a database with nonzero rows.
By extension, we can say that the condition is not met when , for any feasible combination of values of .
Similarly, we can define for the other three frequency counts by swapping the variables accordingly. 
is satisfied when is satisfied . This is essentially an extension of the classification from to a general property for the metric as a whole.
3.2 UNZR property definition
The classification scheme we adopt for UNZR is more complex than . Similar to UNAI we adopt a twopronged approach of defining at the level as well as a defining it for the metric as a whole. However, we differ from in that states are not binary, but have three states that correspond to the property being satisfied, partially satisfied, and not satisfied. Another aspect of the difference is that the definitions at the level are different for {, } and {, }. They are identically opposite in terms inequality conditions that need to be met, as shown below. We formally defined the property for and below and extend it to the other frequency counts and respectively:

is satisfied when for all feasible combinations of . Again, a feasible combination is one that enables the computation of the metric in deterministic forms. This formulation can be extended to by swapping the variables accordingly.
is satisfied when for all feasible combinations of . This formulation can be extended to by swapping the variables accordingly. 
is partially satisfied when two conditions are met. These are: (i) for all feasible combinations of , and (ii) for at least one or more feasible combinations of . This formulation can be extended to by swapping the variables accordingly.
Similarly, is partially satisfied when two conditions are met. These are: (i) for all feasible combinations of , and (ii) for at least one or more feasible combinations of . This formulation can be extended to by swapping the variables accordingly. 
Finally, by extension, we can say that is not satisfied when either of these two conditions are met: (i) for any feasible combination of or, (ii) for all feasible combinations of . This formulation can be extended to by swapping the variables accordingly.
Similarly, we can say that is not satisfied when either of these two conditions are met: (i) for any feasible combination of or, (ii) for all feasible combinations of . This formulation can be extended to by swapping the variables accordingly. 
At the overall metric level we say that property is satisfied for a metric if the is satisfied . We say that UNZR property is partially satisfied for a metric if is at least partially satisfied for all s. Finally, a metric fails to satisfy the UNZR property if one or more s do not satisfy the property.
4 Illustrative example of the UNAI and UNZR framework using Lift
In this sections, we consider the behaviour of the popular interestingness measure, Lift under the UNAI and UNZR properties defined in the previous section. Lift is defined as follows:
(6) 
Differentiating w.r.t to and simplifying, we get
(7) 
We check the UNAI property for Lift by considering the derivative as
(8) 
After algebraic simplification we can say that the above function is equal to zero for all feasible combinations of , and . Hence, We can say that Lift satisfies UNAI with respect to . Similarly, we check for UNAI property with respect to , , . Hence, We can say that Lift satisfies UNAI with respect to . Similarly, we check for UNAI property with respect to .
(9) 
(10) 
(11) 
Here it is evident that this function is not equal to 0 for all possible values of . Hence, we say that is not satisfied but I w.r.t to is satisfied.
We check for the UNZR property for by taking the partial derivative at , we get,
(12) 
Similarly, taking the derivative with respect to at 0, we get
(13) 
(14) 
(15) 
We see that for all feasible combinations , and are satisfied. However, is only partially satisfied. From equation 13 we can see that the following conditions are met: (i) For all feasible combinations of , . This passes the definition of partial satisfaction for UNZR as defined in the paper. At the same time this does not fully satisfy the property since there are values where it can be 0^{1}^{1}1substitute = 0, while giving the others positive values. Figure 1
5 Mapping UNAI and UNZR to commonly used measures and other properties
This section is divided in two parts. The first part performs a detailed analysis that uses the proposed properties to classify commonly used measures. The second part then compares these classifications to the classification done by other popular properties in literature [12]. This twofold approach is used because it is important to show that a property can actually differentiate between measures (Subsection 5.1), and that it classifies measures in a way that is different from other properties (Subsection 5.2).
5.1 Classification of existing measures using UNAI and UNZR
In this section we classify 50 common measures across the two properties and , at both the level as well as the metric level. We use all 21 metrics from [12] and also borrow popular metrics from [14]. We consciously avoid metrics which are mathematically identical as suggested by [14], but choose to have metrics which could still be rankwise indistinguishable. We do this because practitioners might make sense of an absolute score and the rate at which it increases or decreases. We also avoid metrics which need us to make any a priori
assumptions on probability distributions or cannot be abstracted as a function of
s. The analysis is carried out in accordance to the definitions in Section 3 and findings are summarized in Table 2.Measure 











Lift  Y  N  Y  Y  N  Y  P  P  P  P 
Jaccard  Y  Y  Y  Y  Y  Y  N  P  P  N 
Confidence  Y  Y  Y  Y  Y  Y  N  Y  N  N 
Recall  Y  Y  Y  Y  Y  Y  N  N  Y  N 
Specificity  Y  Y  Y  Y  Y  N  Y  N  Y  N 
Precision  Y  Y  Y  Y  Y  Y  N  Y  N  N 
Ganascia  Y  Y  Y  Y  Y  Y  N  Y  N  N 
Kulczynski1  N  Y  Y  Y  N  Y  N  P  P  N 
FMeasure  Y  Y  Y  Y  Y  Y  N  P  P  N 
Causal Confidence  Y  Y  Y  Y  Y  Y  Y  Y  N  N 
Odd’s Ratio  N  N  Y  Y  N  P  P  P  P  P 
Negative Reliability  Y  Y  Y  Y  Y  N  Y  N  Y  N 
Sebag  Schoenauer  N  Y  Y  Y  N  Y  N  P  N  N 
Accuracy  Y  Y  Y  Y  Y  P  P  P  P  P 
Support  Y  Y  Y  Y  Y  Y  N  P  P  N 
Coverage  Y  Y  Y  Y  Y  P  N  N  P  N 
Prevalence  Y  Y  Y  Y  Y  P  N  P  N  N 
Relative Risk  Y  N  Y  Y  N  Y  P  Y  P  P 
Novelty  Y  Y  Y  Y  Y  Y  Y  Y  Y  Y 
Yule’s Q  Y  Y  Y  Y  Y  P  P  P  P  P 
Yule’s Y  Y  Y  Y  Y  Y  P  P  P  P  P 
Cosine  Y  Y  Y  Y  Y  Y  N  Y  Y  N 
Least Contradiction  Y  Y  N  Y  N  Y  N  Y  N  N 
Odd Multiplier  Y  N  Y  Y  N  Y  P  P  Y  P 
Descriptive Confirm  Y  Y  Y  Y  Y  Y  N  Y  N  N 
Causal Confirm  Y  Y  Y  Y  Y  Y  Y  Y  N  N 
Certainty Factor  Y  Y  Y  N  N  P  P  N  Y  N 
Conviction  Y  Y  Y  Y  Y  P  P  P  Y  P 
Informational Gain  Y  Y  Y  Y  Y  Y  Y  P  P  P 
Laplace  Y  Y  Y  Y  Y  Y  N  Y  N  N 
Klosgen  Y  Y  Y  Y  Y  P  N  N  N  N 
Piatetsky  Shapiro  Y  Y  Y  Y  Y  Y  Y  Y  Y  Y 
Zhang  Y  N  Y  N  N  Y  P  Y  P  P 
Y and L’s 1way support*  Y  Y  Y  Y  Y  N  P  N  P  N 
Y and L’s 2way support*  Y  Y  Y  Y  Y  N  P  Y  Y  N 
Implication Index  Y  Y  Y  Y  Y  N  N  N  N  N 
Leverage  Y  Y  Y  Y  Y  Y  P  Y  N  N 
Kappa  Y  Y  Y  Y  Y  P  P  Y  Y  P 
Causal Confirm Confidence  Y  Y  Y  Y  Y  Y  Y  Y  N  N 
Examples and Counter Examples  Y  Y  N  Y  N  P  N  Y  N  N 
Putative Casual Dependency  Y  Y  Y  Y  Y  P  P  Y  Y  P 
Dependency  Y  Y  Y  Y  Y  P  P  P  P  P 
Jmeasure  Y  Y  Y  Y  Y  N  N  Y  N  N 
Collective Strength  Y  Y  Y  Y  Y  Y  Y  Y  Y  Y 
Gini Index  Y  Y  Y  Y  Y  N  N  P  P  N 
GoodmanKruskal  N  N  N  N  N  N  N  N  N  N 
Mutual Information  Y  Y  Y  Y  Y  N  N  Y  Y  N 
Normalized Mutual Information  Y  Y  Y  Y  Y  N  N  N  N  N 
Loevinger  Y  Y  Y  N  N  P  P  N  Y  N 
Added value  N  Y  N  N  N  P  P  P  P  P 

Where, Y: Indicates that the Property is Satisfied, P: Indicates that the property is partially satisfied, and N: Indicates that the property is not satisfied
* These metric names are shortened to fit into the table: Y and L’s stand for Yao and Liu’s for both the shortened names
The results on the classification of these measures provide two important insights. First, that property for the metrics as a whole is satisfied by a majority of the measures (37 of the 50). These numbers are even higher for the individual (ranging from 45 for , 44 for , 46 for and 45 for out of the 50 measures). This suggests that UNAI would be less useful as a tool to eliminate measures that nullify the unstable effect of one frequency count being particularly large. Instead, this property can be useful when due importance needs to be given when a frequency count is expected to be high and continues to grow. A classic scenario would be Lift. In certain contexts, an increase in coabsence in a sparse database should continue to increase the metric value since it makes copresence even less probabilistic through random chance.
The second insight from the case of is of a different nature. At the overall metric level, there are only 3 measures that fully satisfy the UNZR property, they are Novelty, PiatetskyShapiro and Collective Strength. Of the remaining, 14 measures partially satisfy the property and 33 fail to satisfy the property. For each the UNZR measures are more discerning. In the case of , 25 satisfy the property, 9 for , 22 for and 15 for . These suggest that UNZR at the level could be more meaningfully used to pick metrics, especially for the case of , which is satisfied by only nine measures. A particular case could be when the practitioner expects an to be low or close to zero and would like to see the metric impacted when presented with evidence of it. The use of at the overall metric level could also be useful if the practitioner suspects that any of the frequency values can be close to zero but would like to see its presence or absence to have a meaningful impact on the metric.
5.2 Comparing the UNAI and UNZR mapping with other properties
In this section we compare the classification of measures done through and , with the classification done through other properties in literature [12]. This is important because, in addition to fulfilling other criteria, it is necessary that a property classifies measures differently from other preexisting properties. Otherwise, there is a redundancy and one could question the need for the new property in question. We conduct our comparison on the properties proposed by [12]. This includes five new properties proposed in that study, as well as three previous properties from [10]. In order to perform the analysis, we take all the 50 measures analyzed in Table 2 which include the 21 measures analyzed by [12]. We conduct an analysis that compares the classification of these measures across the two states of and three states of and compare it to the two states (satisfied or not satisfied) across the 8 properties presented in [12]. This leads us to create the Contingency Table 3.
UNAI  UNZR  

Satisfied  Not Satisfied  Satisfied  Partially Satisfied  Not Satisfied  
P1: Statistical independence  Satisfied  15  4  2  8  9 
Not Satisfied  22  9  1  6  24  
P2:(Refer [10])  Satisfied  34  13  3  14  30 
Not Satisfied  3  0  0  0  3  
P3:(Refer [10])  Satisfied  27  11  3  14  21 
Not Satisfied  10  2  0  0  12  
O1: Symmetry under variable permutation  Satisfied  13  4  3  7  7 
Not Satisfied  24  9  0  7  26  
O2: Row and Column Scaling Invariance  Satisfied  2  1  0  3  0 
Not Satisfied  35  12  3  11  33  
O3: Antisymmetry row or column permutation  Satisfied  4  0  2  2  0 
Not Satisfied  33  13  1  12  33  
O3’: Inversion Invariance  Satisfied  10  1  3  5  3 
Not Satisfied  27  12  0  9  30  
O4: Null Invariance  Satisfied  8  4  0  0  12 
Not Satisfied  29  9  3  14  21 
The findings from Table 3 suggest that the classification of measures through and are more or less independent of the classification done through all of the eight preexisting properties. The few cases where we see low overlaps is also easily explainable by the low membership to a certain class and not a relationship between properties (for instance, observe that only 3 of the 50 measures satisfy the ’Row and Column Scaling Invariance’ or fully satisfy UNZR). We do not, however, carry out a ChiSquare test to establish independence because in the case of some properties they are explicitly related. For instance, all Null Invariant properties have to fail UNZR by definition. It is therefore not entirely meaningful to perform such an analysis to look at statistical independence. The overarching conclusion from the Table 3 is that while some of these properties could be weakly related to each other, there is sufficient independence with preexisting properties that can justify UNAI and UNZR as two new properties interms of classification of measures.
6 Empirical Studies
The work of [14] has established that empirical clustering of measures bears no meaningful relationship to properties presented in [12] (which also cover three properties originally presented in [10]). While the properties UNAI and UNZR have been constructed to intuitively convey a certain mathematical aspect of the measure, an important motivation and therefore requirement in design was that they have a meaningful map to the actual behavior of measures, empirically. Our studies across a wide range of datasets, both synthetic and real suggest that these two properties bear strong relationships with the empirical clusters. More interestingly, we find that the results are substantially more pronounced in certain environmental conditions. Specifically, we find that and are valuable in sparse datasets, and correspondingly and are better properties to consider in dense data. In the following sections, we do a detailed and illustrative analysis showing how the classification of measures is useful in sparse datasets and is useful in dense datasets. The motivation to choose the properties over the is the fact that the creates groups of more or less equal sizes. For instance, splits the measures with 25 of them satisfying the property, 15 of them partially satisfying it, and 10 of them failing to satisfy the property. Where as with we see that 44 of the 50 measures satisfy this property. A similar comparison exists between and .
We conduct our empirical studies by first considering synthetic contingency tables that mimic sparse and dense datasets, and in each case we explore further by choosing a real world dataset that is sparse and dense, respectively. Based on the rule ranking of the measures in the two environmental conditions, we then cluster the measures into sets and see how they correlate with the property of interest.
6.1 Sparse datasets
Sparse datasets are characterized by having a relatively high count with respect to , primarily, and to a lesser extent , and . As discussed in the previous section we choose to analyze the effect of the property in this setting.
We mimic the rules from a synthetic dataset using artificially created sets of rules in form of contingency tables. We do this specifically for the sparse settings. We achieve these environments by assigning low values to , high values for , while , fall in between the two extremes. The , , and cells of the tables took the values {0, 1, 10, 11}, {1000, 5000, 10000, 25000, 50000, 75000, 100000}, {10, 100, 250, 500, 600, 800, 1000} and {10, 100, 250, 500, 600, 800, 1000} respectively. This resulted in unique contingency tables, each representing a rule in a sparse dataset.
For the real world dataset, we chose the fairly popular ’Adult’ data set from the UCI Machine Learning archive
[9]. This is essentially an extraction from a census database which has demographic and financial information of individuals. This includes features like age, employment, gender, native country, etc.In its native format there are a total of 14 features and more than 48,000 records. A detailed discretization and binarization of variables was carried out in conformance to the best practices suggested in
[13]. These helps us create the transactional table. This table now has a total of 115 features. We confine the analysis to onetoone rules. We use a basic support based pruning with a threshold close to 0, in order to get a full enumeration of all onetoone rules but avoid a variable mapping to itself. This results in a total of rules. Similar to the [14] we choose a subset of the rules to compare. However, given the unique nature of our problem, unlike [14] we do not randomly select the rules. Instead we choose a subset of rules that are typically encountered in sparse data sets, by selecting cases where is lower than . This results in rules.In the next steps we follow the same procedure as [14]. Each rule is evaluated using each measure, and a rank ordering of rules is done for each measure. Using Spearman’s rank correlation, we create a matrix of pairwise distances between measures which acts as the adjacency matrix for a complete graph. We create clusters by using a threshold value of 0.8 on the correlation coefficient. This process naturally creates groups of measures depending on the threshold used. While there are various other graph clustering algorithms that can be implemented, the simplicity of this approach is appealing.
Dataset  Cluster  Measures  N  P  Y 
50  10  15  25  
Synthetic  A  21  0  4  17 
B  20  4  9  7  
C  9  6  2  1  
Adult  A  36  2  12  22 
B  14  8  3  3 
Our study finds that there is a significant match between the three property states and the clusters that are formed for both the synthetic and real data sets. However, this is not a perfect overlap. We split the measures into three clusters in the synthetic setting and into two clusters in the ’Adult’ dataset’s rules. The cluster memberships are shown below:
Synthetic dataset: Cluster A: { Recall, Precision, Confidence, Jaccard, FMeasure, Odd’s Ratio, Sebag Schoenauer, Support, Lift, Ganascia, Kulczynski1, Relative Risk, Yule’s Q, Yule’s Y, Cosine, Odd Multiplier, Information Gain, Laplace, Zhang, Leverage, Examples and Counter Examples }, Cluster B: { Specificity, Negative Reliability, Accuracy, Descriptive Confirm, Causal Confirm, PiatetskyShapiro, Novelty, Causal Confidence, Certainty Factor, Loevinger, Conviction, Klosgen, 1Way Support, 2Way Support, Kappa, Putative Causal Dependency, Causal Confirm Confidence, Added Value, Collective Strength, Dependency }, Cluster C: { Mutual Information, Coverage, Prevalence, Least Contradiction, Normalized Mutual Information, Implication Index, Gini Index, Goodman Kruskal, JMeasure }
’Adult’ dataset: Cluster A: { Recall, Precision, Confidence, Jaccard, FMeasure, Odd’s Ratio, Sebag Schoenauer, Support, Causal Confidence, Lift, Ganascia, Kulczynski1, Relative Risk, PiatetskyShapiro, Novelty, Yule’s Q, Yule’s Y, Cosine, Odd Multiplier, Certainty Factor, Loevinger, Conviction, Information Gain, Laplace, Klosgen, Zhang, 1Way Support, 2Way Support, Leverage, Kappa, Putative Causal Dependency, Examples and Counter Examples, Causal Confirm Confidence, Added Value, Collective Strength, Dependency }, Cluster B: { Mutual Information, Specificity, Negative Reliability, Accuracy, Coverage, Prevalence, Least Contradiction, Descriptive Confirm, Causal Confirm, Normalized Mutual Information, Implication Index, Gini Index, Goodman Kruskal, JMeasure }
The relationship between empirical cluster memberships and property affiliations is summarized in Table 4. In the synthetic dataset, all of the 21 measures of cluster A satisfy , either completely of partially. The split is rather more even in cluster B, but cluster C is dominated by measures which do not satisfy . In the ’Adult’ dataset, cluster A again overwhelmingly consists of measures which satisfy , either partially or completely (34 out of 36), whereas the properties that do not satisfy tend to exist more in cluster B.
6.2 Dense datasets
We characterize dense dataset as one which has relatively higher count compared to count, primarily, and to a lesser extent , and . As discussed earlier, we choose to study the effect of property in this environment.
The motivation for using synthetic tables is the same as in the sparse case. The values chosen for , , and cells are {1000, 5000, 10000, 25000, 50000, 75000, 100000}, {0, 1, 10, 11}, {10, 100, 250, 500, 600, 800, 1000} and {10, 100, 250, 500, 600, 800, 1000} respectively. This resulted in unique contingency tables.
For the real world dataset, we chose ’Mushroom’ data set from the UCI Machine Learning archive [9]. This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. The methodology of rule generation was identical to that of the ’Adult’ dataset, with the focus to create rules from a dense environment (as opposed to the sparse environment in the Adult dataset). This process results in in rules being used for the purpose of rule ranking.
Dataset  Cluster  Measures  N  P  Y 
50  23  18  9  
Synthetic  A  24  3  15  6 
B  19  14  2  3  
C  7  6  1  0  
Mushroom  A  23  2  15  6 
B  12  7  3  2  
C  12  11  0  1  
D  3  3  0  0 
The synthetic dataset was split into 3 clusters while the ’Mushroom’ dataset was split into 4 clusters. The cluster memberships are shown below:
Synthetic dataset: Cluster A: { Recall, Odd’s Ratio, Specificity, Negative Reliability, Lift, Coverage, PiatetskyShapiro, Novelty, Yule’s Q, Yule’s Y, Odd Multiplier, Certainty Factor, Loevinger, Conviction, Information Gain, Klosgen, Zhang, 1Way Support, 2Way Support, Kappa, Putative Causal Dependency, Added Value, Collective Strength, Dependency }Cluster B: { Precision, Confidence, Jaccard, FMeasure, Sebag Schoenauer, Support, Accuracy, Causal Confidence, Ganascia, Kulczynski1, Prevalence, Relative Risk, Cosine, Least Contradiction, Descriptive Confirm, Causal Confirm, Laplace, Examples and Counter Examples, Causal Confirm Confidence }Cluster C: { Mutual Information, Normalized Mutual Information, Implication Index, Gini Index, Goodman Kruskal, Leverage, JMeasure }
’Mushroom’ dataset: Cluster A: { Recall, Specificity, Negative Reliability, Lift, PiatetskyShapiro, Novelty, Yule’s Q, Yule’s Y, Odd Multiplier, Certainty Factor, Loevinger, Conviction, Information Gain, Klosgen, Zhang, 1Way Support, 2Way Support, Leverage, Kappa, Putative Causal Dependency, Added Value, Collective Strength, Dependency } Cluster B: { Mutual Information, Odd’s Ratio, Accuracy, Causal Confidence, Prevalence, Relative Risk, Least Contradiction, Descriptive Confirm, Causal Confirm, Normalized Mutual Information, Gini Index, JMeasure } Cluster C: { Precision, Confidence, Jaccard, FMeasure, Sebag Schoenauer, Support, Ganascia, Kulczynski1, Cosine, Laplace, Examples and Counter Examples, Causal Confirm Confidence } Cluster D: { Coverage, Implication Index, Goodman Kruskal }
The results from this analysis is summarized in Table 5. In the synthetic dataset, cluster A is populated by measures which satisfy the (21 out of 24), either partially or completely. Clusters B (14 out of 19) and C (6 out of 7) are dominated by measures that do not satisfy . In the ’Mushroom’ dataset, cluster A is again consisted of measures which satisfy , either partially or completely (21 out of 23). Cluster B is split between the measures that satisfy and measure that don’t (7 N’s vs 3 P’s and 2 Y’s). Clusters C and D are overwhelmingly consisted of measures which don’t satisfy , with only 1 measure satisfying the property among the 15 in both clusters combined. In general, it is evident that the clustering holds a clear mapping to the property for the selected rules in a dense setting.
7 Conclusions and Future work
This study presents a new propertybased framework (RCA) for analyzing interestingness measures. This framework uses the partial derivative of an IM with respect to a frequency count. This provides us with the insight of how the IM will change when the frequency count is increased or decrease. This approach is then used to create two specific properties, and , which correspond to taking the partial derivative at two points, infinity and zero. The study then showcases the classification of a broad set of measures in accordance to these properties and also compares them to the classification done by other properties in literature. The properties proposed in this study classify the measures assigning memberships to all property states, suggesting that they might be discerning some meaningful differences in the measures. The classifications through these properties are also fairly independent of those done by other preexisting properties, suggesting, that something new is being captured. Finally, the study showcases the utility of classification through the new properties by conducting empirical analyses on both synthetic and realworld data sets, which relate the rule ranking behavior of the measures with two of the properties proposed. The findings suggest that the rule ranking behavior holds a clear relationship to the classification done by the property.
One of the major contributions of this research is the new framework (RCA) for analyzing measures using the rate of change idea through partial differentiation. This is markedly different from the propertybased classification schemes that currently exist in literature. Given this, we feel that there could be a more extensions in the development of properties that build on this idea, which go beyond the two that are proposed in this study. Also, the idea of using differentiation as tool to defining properties opens up a plethora of characteristics that can be analyzed. One possible extension is to study the shape of the partial derivative curve (linear, polynomial, etc).
Finally, the authors in this study agree with the view put forth in [14] that meaningful classification of measures needs to, also, be driven by similarity (or dissimilarity) in rule ranking that can be seen on empirical data sets. We would like to extend this argument by stating that the value of mathematical properties, derived from principled arguments, can be benchmarked acrosstheboard in this fashion (this study performs such an analysis exclusively for the two properties proposed in this study). This can also be extended beyond Interestingness measures in ARM. We can see that classification metrics (some of which are included in this analysis like accuracy, recall, specificity, etc.) can also be defined by the same contingency table (for two class classification problems) and could therefore lend themselves to a representation and segmentation using a rate of change analysis.
Acknowledgments
This work was supported by a funding from IIT Madras (CSE/1415/831/RFTP/BRAV)
References
 [1] Agrawal, R., Imieliński, T., and Swami, A. Mining association rules between sets of items in large databases. In ACM SIGMOD international conference on Management of data (1993), pp. 207–216.
 [2] Freitas, A. A. On rule interestingness measures. KnowledgeBased Systems 12, 5 (1999), 309–315.
 [3] Geng, L., and Hamilton, H. Choosing the right lens: Finding what is interesting in data mining. Quality Measures in Data Mining (2007), 3–24.
 [4] Guillaume, S., Grissa, D., and Mephu Nguifo, E. Categorization of interestingness measures for knowledge extraction, June 2012.

[5]
Hébert, C., and Crémilleux, B.
A Unified View of Objective Interestingness Measures.
In
Machine Learning and Data Mining in Pattern Recognition
. Springer, Berlin, Heidelberg, 2007, pp. 533–547.  [6] Hilderman, R. J., and Hamilton, H. J. Evaluation of interestingness measures for ranking discovered knowledge. In PacificAsia Conference on Knowledge Discovery and Data Mining (2001), pp. 247–259.
 [7] Huynh, H. X., Guillet, F., Blanchard, J., Kuntz, P., Briand, H., and Gras, R. A graphbased clustering approach to evaluate interestingness measures: A tool and a comparative study. Quality Measures in Data Mining 43 (2007), 25–50.
 [8] JalaliHeravi, M., and Zaïane, O. R. A study on interestingness measures for associative classifiers. In ACM Symposium on Applied Computing (2010), p. 1039.
 [9] Lichman, M. UCI machine learning repository, 2013.
 [10] PiatetskyShapiro, G. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. PiatetskyShapiro and W. Frawley, Eds. AAAI/MIT, Menlo Park, CA, 1991, pp. 229–248.
 [11] Tan, P.N., Kumar, V., and Srivastava, J. Selecting the right interestingness measure for association patterns. In ACM SIGKDD international conference on Knowledge discovery and data mining (2002).
 [12] Tan, P.N., Kumar, V., and Srivastava, J. Selecting the right objective measure for association analysis. Information Systems 29, 4 (2004), 293 – 313. Knowledge Discovery and Data Mining (KDD 2002).
 [13] Tan, P.N., Steinbach, M., and Kumar, V. Introduction to Data Mining, (First Edition). AddisonWesley Longman Publishing Co., Inc., Boston, MA, USA, 2005.
 [14] Tew, C., GiraudCarrier, C., Tanner, K., and Burton, S. Behaviorbased clustering and analysis of interestingness measures for association rule mining. Data Mining and Knowledge Discovery 28, 4 (2014), 1004–1045.
 [15] Wu, T., Chen, Y., and Han, J. Association mining in large databases: A reexamination of its measures. In European Conference on Principles of Data Mining and Knowledge Discovery (2007), pp. 621–628.