Information cartography in association rule mining

by   Iztok Fister Jr, et al.

Association Rule Mining is a data mining method for discovering the interesting relations between attributes in a huge transaction database. Typically, algorithms for association rule mining generate a huge number of association rules, from which it is hard to extract structured knowledge and automatically present this in a form that would be suitable for the user. Recently, an information cartography has been proposed for creating structured summaries of information and visualizing with methodology called "metro maps". This was applied to many problem domains. In the hope of widening its applicability domain, the aim of this study is to develop a method for the automatic creation of metro maps of information obtained by association rule mining. Although the proposed method consists of multiple steps, its core presents metro map construction that is defined in the study as an optimization problem, which is solved using an evolutionary algorithm. Finally, this was applied to four well-known UCI Machine Learning datasets and one sport dataset. Visualizing the resulted metro maps not only justifies the fact this is a suitable tool for presenting structured knowledge hidden in data, but also that they can even tell stories to users.



page 5

page 10

page 11

page 12

page 13


Population-based metaheuristics for Association Rule Text Mining

Nowadays, the majority of data on the Internet is held in an unstructure...

uARMSolver: A framework for Association Rule Mining

The paper presents a novel software framework for Association Rule Minin...

SCARF: A Biomedical Association Rule Finding Webserver

The analysis of enormous datasets with missing data entries is a standar...

Association rules over time

Decisions made nowadays by Artificial Intelligence powered systems are u...

The automatic creation of concept maps from documents written using morphologically rich languages

Concept map is a graphical tool for representing knowledge. They have be...

RecoMed: A Knowledge-Aware Recommender System for Hypertension Medications

Background and Objective High medicine diversity has always been a signi...

Matching Social Issues to Technologies for Civic Tech by Association Rule Mining using Weighted Casual Confidence

More than 80 civic tech communities in Japan are developing information ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Association Rule Mining (ARM) is a data mining method for discovering the interesting relations between attributes in a huge transaction database. The first algorithm for ARM was Apriori, which was proposed back in 1994 by Agrawal [1]. Apriori is still the most popular algorithm for mining association rules. This was also considered to be one of the top 10 algorithms in data mining [2]. Some of the other well-established algorithms in this domain are Eclat [3], FP-Growth [4], Genetic Association Rules (GAR) [5], Multi-Objective Differential Evolution algorithm for mining numeric Association Rules (MODENAR) [6], and BatMiner [7].

Usually, algorithms for association rule mining generate a huge number of association rules collected in large datasets. Hence, we are confronted with the problem of how to extract structured knowledge from the large datasets and then automatically present this knowledge to the user in a form that would be suitable  [8]. In line with this, numerous attempts to simplify this task have emerged using various visualization methods.

In a nutshell, the visualization of association rules was tackled in the following papers: The authors in [9] presented a design that is able to handle hundreds of multiple antecedent association rules in a three-dimensional display with minimum human interaction, low occlusion percentage, and no screen swapping. The authors in [10] show that the use of Mosaic plots and their variant, called Double Decker plots, can be applied for visualizing association rules. Ong et al. [11] prototyped the two visualizations, called grid view and tree view, for visualizing the association rules in their application called CrystalClear. Appice and Buono [12] presented a graph-based visualization that supports data miners in the analysis of multi-level spatial association rules, while Herawan et al. [13] proposed an approach for visualizing soft maximal association rules. A very interesting interactive visualization technique, which lets the user navigate through a hierarchy of association rule groups is presented in paper [14]. Authors in paper [15] explores the Hasse diagrams for the visualization of Boolean association rules. Fister et al. [16] have proposed a method for identifying dependencies among mined association rules based on a population-based metaheuristics and complex networks.

Recently, a metro map of information concept has been developed [17] that is capable of creating structured summaries of information. The name was selected as a metaphor for a real cartographic map, i.e. in the same way that these maps help people understand their surroundings, metro maps help them understand the information landscape [18]. Moreover, visualization with metro maps (also information cartography) can tell stories to users, on the one hand, and provide them with good directions, on the other. Indeed, the metro map consists of a set of lines, where each line interprets the same story from a different aspect. Metro stops on these lines introduce salient pieces of information, while the interrelations among these pieces ensure the plot of the story. Recently, this methodology has been applied for understanding information in many areas [19, 18].

The aim of the study is to develop method for automatically creating the metro maps of ARM information. The proposed method acts as follows: The ARM information are hidden in ARM datasets produced by algorithms for ARM in the form of implication rules . In general, these rules are represented as conjunctions of more antecedent and more consequent attributes. At first, each complex rule is simplified to a set of simple ones consisting of one antecedent and one consequent. These rules serve as building blocks for the construction of an attribute graph. The attribute graph consists of nodes representing attributes (e.g. , ) and direct edges denoting an implication relation (e.g. ). The construction of a metro map is defined as an optimization problem that searches for the best metro lines within the attribute graph. The problem is solved using an Evolutionary Algorithm (EA). Finally, the metro map is visualized.

As a matter of fact, how can a metro map of ARM information tell us a story? Likewise, since each story consists of five elements [20]: characters, setting, plot, conflict, and resolution, metro maps include the same elements too. Indeed, the attributes of the ARM dataset present the characters of the story, the setting is introduced by the ARM problem domain, the plot evolves the actual story from the beginning (i.e. the starting metro stop) towards the middle (i.e. intermediate metro stops) until the end (i.e. the final metro stop), conflicts are launched by the interrelation between intermediate metro stops, and the resolution bears a way of conflict resolution, in which the point of the story is reflected. The only thing we need is to connect all the elements into an integral whole and the story starts to flow by itself.

The paper is structured as follows: Section 2 introduces the basic information needed for understanding the subject that follows. In Section 3, the proposed method for information cartography in ARM is illustrated in detail. Section 4 presents the results of the method by constructing metro maps of information obtained by five well known datasets, while the paper concludes with a summary of the performed work and outlines directions for future work.

2 Basic information

The present section consists of two subsections. The former introduces the problem of ARM, while the latter describes the formal definition of objectives necessary for identifying the information cartography in ARM as an optimization problem.

2.1 Association Rule Mining

ARM can formally be defined as follows: Let us assume, a set of objects and transaction dataset are given, where each transaction is a subset of objects . Then, an association rule is defined as an implication:


where , , and

. In order to estimate the quality of the mined association rule, two measures are defined: confidence and support. The confidence is defined as:


whereas support is:


where the function calculates the number of repetitions of a particular rule within , and is the total number of transactions in . Let us emphasize that two additional variables are defined, i.e. minimum confidence and minimum support . These variables denote a threshold value limiting the particular association rule with lower confidence and support from being taken into consideration.

2.2 Formal definition of objectives

The concept of a metro map is applied in order to visualize the archive of mined association rules [18]. In our study, the metro map is formally defined as , where denotes an attribute graph of vertices , representing attributes, and edges , representing simple rules, together with incident function

that associates an ordered pair

with direct edge , when there exist a simple association rule in the form of , and represents a set of paths in . In the definitions, variables and denotes the maximum number of vertices and maximum number of edges, respectively. Thus, the simple association rule consists of only one antecedent and one consequent, where the former is mapped to the source node and the latter to the sink node of the corresponding attribute graph, while the path leads from the source to the sink node.

In general, the association rules in the archive consist of more antecedents and more consequences, in other words:


The simple association rules are obtained from the mined rules by pairing each antecedent with each consequent, in other words:


In this process of simplifying rules, the pairs of simple rules are obtained representing direct edges in the association graph.

Interestingly, attributes in the ARM databases are typically denoted by the feature name and type connected with the underlined character. Type determines a domain of values that the definite feature can capture. In the ARM domain, there are three types, i.e. categorical, numerical, and mixed. The categorical type consists of a domain of discrete values, while the numerical one uses a domain of continuous real values that must be discretized. The last type can employ elements from both of the aforementioned domains.

The attributes can appear in mined association rules as: (1) antecedent only, (2) consequent only, or (3) antecedent in one and consequent in the other rule. In line with this, these are divided into three subsets, i.e. , , and . In graph , the attributes in antecedent subset represent source nodes with indegree zero, the attributes in consequent subset are sink nodes with outdegree zero, while the attributes in the mixed subset denote the intern nodes with indegree and outdegree higher than zero. Indeed, the antecedent set consists of nodes suitable for starting metro stops on the metro lines, the consequent set for the final metro stops, while the mixed set determines the intermediate metro stops and outlines a definite path towards achieving a certain end destination.

The algorithm for constructing the metro map for visualizing the association rules needs to fulfill the following four objectives:

  • minimum line coherence,

  • maximum map size,

  • high coverage,

  • high structure quality.

The minimum line coherence limits the number of intermediate metro stops in some metro line and is expressed by the following relation:


where the variable determines the maximum number of intermediate metro stops. The maximum map size is referred to the number of metro lines , in other words:


Indeed, we are interested in covering our information domain by using the number of metro lines as close to as possible.

The coverage estimates how well the selected metro line exploits the attributes in transaction database. In line with this, the lift measure of association rule is used that is expressed as:


Let it be noted that the characteristic of the measure is that the higher the value, the stronger the association. Additionally, the coverage of the whole metro line is expressed as:


where represents the particular simple association rule . Finally, the coverage of the metro map is a simple average of all the proposed metro lines, in other words:


The metro map structure quality refers to the diversity of the metro lines, where we are interested in those metro lines that differ in the intermediate points as much as possible. This relation is expressed by the following equation:


where the variable counts the number of metro line interactions.

3 Proposed method

A proposed method for information cartography in ARM consists of the following four steps (Fig. 1):

  • creating the ARM database,

  • association rule simplification,

  • attribute graph definition,

  • metro map construction,

  • metro map visualization.

The ARM database is a result of the ARM algorithm, where the modern stochastic population-based nature-inspired algorithms, like BatMiner, can also be used instead of the classical approaches, like Apriori or FPGrowth.

On the other hand, the results of the ARM algorithms are slightly confusing for information cartography in the sense that all attributes in the mined association rules can emerge as antecedent in one and consequent in another rule. For the ARM algorithms, all attributes have the same importance, regardless of whether they could emerge as the result of the classification process and therefore such attributes could not be introduced as antecedent of some rules. In a mathematical sense, we must ensure that any of the following three inequations: A, A, and A is valid.

The aforementioned conditions are satisfied in our study by proper filtering, where all association rules with attributes that are the result of classification and appear in these as an antecedent are filtered. In cases when the collection of association rules does not include the results of the classification process, the set with the maximum number of rules containing the same consequent is searched for by using the linear programming algorithm, and the filtering is then subsequently performed similarly as in the first case.

Figure 1: Concept of metro map creation in association rule mining.

The main characteristics of the ARM databases are that the mined association rules are in a broad form with many antecedent as well as consequent attributes. This form is not appropriate for the creation of attribute graphs and therefore rules need to be transformed into a simplified form. The simplification rule procedure was discussed in detail in the previous section and therefore avoided here.

The third step is dedicated for creating the attribute graph , where all the simplified association rules are incorporated in the adjacent matrix of dimension , where:


It should be mentioned that no loops are allowed in this graph because of for . The classification of attributes in three distinguished sets (i.e. , , and ) is also performed in this step.

In the remainder of the section, the fourth step is described, where the metro map is constructed, while the visualization of the metro map is examined in the next section.

3.1 EA for metro map construction

The EA for metro map construction demands modifications of the following algorithm components [21]:

  • representation of solution,

  • variation operators (i.e. crossover and mutation),

  • survivor and parent selection,

  • fitness function evaluation,

  • initialization.

The problem of metro map construction is constrained, because every feasible solution must satisfy two relations: minimum line coherence (according to Eq. (6)), and maximum map size (according to Eq. (7)).

3.1.1 Representation of solutions

Each solution in the population of individuals represents a metro map that is encoded as follows:


where the first part of the representation is dedicated for control meta-parameter that determines the number of metro lines and determines individuals of variable length. The second part consists of detailed descriptions for particular metro lines, expressed as:


where each element for encodes a specific simple association rule , and determines the number of associative rules within the metro line . The elements are ordered into a sequence of implication rules:


in such a way that each consequence of the -th rule appears as an antecedent in the -th rule, in other words: If consecutive elements and encode the rules and in the metro line , then the relation must hold.

According to standard rules in mathematical logic, Eq. (15) can be transformed as follows:


that can easier be applied by an interpretation of the obtained results.

3.1.2 Variation operators

The proposed algorithms support two variation operators, i.e. crossover and mutation. While the crossover operates on metro lines as a whole, the mutation is also capable of modifying the structure inside a particular metro lines. Obviously, the application of crossover is controlled using the probability of crossover

, and mutation using the probability of mutation .


is defined as follows: At first, for each target metro map , a trial metro map is created using the same control parameters. Then a parent metro map is selected randomly and the metro lines for the trial are taken either from the target or parent metro maps according to probability . Mathematically, this crossover is expressed as:


for . Consequently, the operator can produce infeasible solutions in two cases:

  • the size of the parent metro map is smaller than the size of the trial, i.e. ,

  • the first antecedent of the metro line representing the starting metro stop is replicated twice.

In the first case, the corresponding metro line is deleted from the trial solution, while in the second, the solution from the parent solution is preferred.


modifies a structure of the metro line as follows: At first, the position of mutation is selected randomly according to the probability of mutation . Then, an antecedent is extracted from the corresponding association rule , and the new consequent is attached to the rule among all the possible consequent as defined by the graph . Finally, the remaining path from towards the drain is generated randomly.

Mathematically, the mutation is presented as follows: Let us assume the metro line for is given, where each elements represents an association rule , and a position of mutation is selected according to the probability of mutation , in other words:

The result of the mutation is expressed as follows:

where and all the modified values of the metro line are denoted by apostrophes. Let us mention that the operation also has an impact on the metro line length that can be increased or decreased within the allowed maximum metro line length (i.e. ).

3.1.3 Survivor and parent selection

One-to-one selection is applied as an operator of survivor selection that is borrowed from Differential Evolution (DE) [22]. This selection works on the whole metro map . Mathematically, it is expressed as follows:


where the better between trial and target vector is preserved as the candidate solution map that proceeds into the next generation.

Parent selection is applied by crossover operator for generating the trial solution. Although there are many parent selection operators, the parent selection implemented in our study selects the parent among the other population members randomly.

3.1.4 Fitness function evaluation

Fitness function evaluates the constructed metro map according to two objectives: the coverage (according to Eq. (10)), and the quality (according to Eq. (11)). Both equations are contained within a linear combination as follows:


where the weight variable indicates the influence of the second term on the total fitness value, and is the number of metro lines. However, each solution is subject to the minimum line coherence, and the maximum map size as already asserted. In line with this, all infeasible solutions are eliminated, i.e. there is no implemented any constraint handling mechanism.

The task of the optimization algorithm is to maximize the value of the fitness function. Because the quality of the metro map is measured by minimizing the similarity of the metro lines, the inverse of this function is considered in Eq. (19).

3.1.5 Initialization

Initialization of individuals in the population is performed randomly. At first, the number of metro lines is generated randomly from the interval . Then, the unique source node is selected from the set of source nodes. For each source node, the random path in the attribute graph is searched for so that the maximum metro line length is not exceeded.

4 Experiments and results

The purpose of our experimental work was to show that the huge amount of data obtained by algorithms for ARM can be automatically extracted in the form of structured knowledge called information cartography in ARM and visualized by metro maps of ARM information. These maps can help users understand information in many knowledge domains. In line with this, we applied our proposed method to five different datasets that accompany data from various domains, e.g. biology, chemistry, and sports.

In summary, using metro maps have several advantages for users, including the ability to [8]:

  • extract the most important association rules from a huge amount of data,

  • find the more salient pieces of information by automated extracting information and to direct attention of users to it,

  • promote the new visualization method for summarizing and presenting complex sets of interrelation concepts,

  • derive the story that metro maps narrate.

The information cartography in ARM is a very complex task that comprises of five tasks as discussed in Section 3. Therefore, the results of each phase (except association rule simplification) are illustrated in detail in the remainder of the paper. For each metro map, the derived story is also narrated to the user.

4.1 ARM datasets creation

In our study, evaluating the efficiency of the proposed method for information cartography in ARM was performed on four public datasets from the UCI Machine Learning repository [23], and one Sport dataset consisting of real data obtained from device trackers worn by sport athletes (i.e. cyclists) during their sport training sessions [16].

The characteristics of the aforementioned datasets are illustrated in Table 1. Let us emphasize that the results of ARM for the first four datasets were obtained using the Apriori algorithm [1], while for the Sport dataset they were obtained by using the BatMiner [7].

Dataset Characteristics Mined rules
Instan. Featur. Attrib. All Filtered
Mushroom 8,124 23 126 24,408 998
Iris 150 5 24 182 88
Abalone 4,177 9 28 36,388 2,779
Wine 178 13 55 5,483 2,355
Sport 80 14 87 4,191 10
Table 1: The ARM datasets characteristics.

Indeed, the table consists of two parts, in which the first indicates the characteristics of particular datasets, like the number of instances, the number of features, and the number of attributes, while the second contains the results of the algorithm for ARM. The results are presented in two columns: The column ’All’ denotes the number of all mined association rules, while the column ’Filtered’ includes the number of all rules obtained after filtering.

In a nutshell, the characteristics of the aforementioned datasets are as follows: the Mushroom dataset contains logical rules for mushrooms indicating if a specific one is poisonous or edible. The dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms, where each species is identified as definitely edible, definitely poisonous, or not recommended. The last class was combined with the poisonous one. As a result, there are two final classes, i.e. ’at1_e’ denotes the edible (included into 865 rules as consequent), and ’at1_p’ the poisonous mushrooms (included into 133 rules as a consequent).

The Iris dataset collects transactions describing characteristics for the classification of three different iris plant (i.e. ’Iris setosa’, ’Iris versicolour’, and ’Iris virginica’), where each transaction is constructed from five features with 21 numerical attributes. Thus, these transactions are equally divided among three classes (33.3 % for each of the three classes).

The Abalone dataset is devoted to predicting the age of abalone from physical measurements [23]. The age of abalone is determined by cutting the shell through the cone, staining them, and then counting the number of rings through a microscope. Other measurements, which are easier to obtain, are used to predict the age. There are three classes originally included in the dataset, i.e. ’rings_(-inf-8]’, ’rings_(8-15]’, and ’rings_(15-22]’. Thus, the first one is included into 69.95 % of rules as a consequent, the second one in 30.05 % of rules as a consequent, while the third one is not included in any association rule as a consequent and is therefore eliminated from observation.

The data in the Wine dataset are obtained as the results of a chemical analysis of wines grown in the same region in Italy, but derived from three different cultivars. The analysis determined the quantities of 13 constituents (features) found in each of the three types of wines, denoted as ’class_1’ (included into 59 rules as a consequent), ’class_2’ (included in 71 rules as a consequent), and ’class_3’ (included in 48 rules as a consequent). In summary, the number of attributes is equal to 19.

A Sport dataset was produced from the TCX files of a professional, male cyclist with many years of experience, who voluntary donated his data for the purposes of this study. Every training session was tracked by a wearable sports watch and the data imported into a training dataset. In addition to these performance data, data for estimation of the athlete’s psycho-physical conditions was also merged with each training session by the sports trainer. Let us emphasize that this dataset does not include any results of classification.

4.2 Attribute graph definition

In this step, the rules in the filtered ARM dataset are transformed into simple rules, at first. Because this step is trivial, it is omitted from at detailed discussion here. Obviously, the simple rules serve as building blocks for attribute graph definition. Actually, the results of the attribute graph definition are presented in Table 2

Dataset Graph Story important nodes
Nodes Edges Source Intern Sink
Mushroom 16 116 4 10 2
Iris 7 15 3 2 2
Abalone 22 124 8 12 2
Wine 21 66 11 8 2
Sport 21 52 5 7 14
Table 2: Characteristics of created attribute graphs.

which, in addition to the attribute graph specification for the particular dataset, also depicts the sizes of the antecedent, consequent, and mixed sets that actually represents the number of source, sink, and intern nodes in the attribute graph, respectively. Together, these nodes are also called story important nodes.

As can be seen from the table, the Iris dataset is relative simple from the problem solving point of view, because this consists of seven nodes (i.e. attributes). This gives the algorithm for a metro map construction a small maneuver space for searching, especially, when we assume that even five nodes are captured for source and sink ones. The remaining attribute graphs are more complex due to the higher number of nodes as well as edges.

4.3 Metro map construction

The EA for the metro map construction used parameters set as illustrated in Table 3, from which it can be seen that the population size was generally applied in our experiments. Actually, this value ensures a good bias between exploration and exploitation and avoids the evolutionary search process getting stuck in local optima. Obviously, the number of metro lines

is problem dependent and is referred to the number of classes, in which the instances of observed datasets are classified. The length of metro lines

is also problem specific and depends on the average path length of the constructed attribute graph. Therefore, the maximum value of this parameter is reported in the table. All the other parameter values from the table are obtained experimentally.

Parameter Abbreviation Value
Number of generation 100
Maximum metro line length 10
Number of metro lines [2,10]
Population size 100
Probability of crossover 0.5
Probability of mutation 0.01
Weight value 0.1
Table 3: Parameter setting of EA for metro map construction.

Each dataset was optimized 25 times and the results of the best run obtained by constructing the metro maps based on five aforementioned datasets were selected for further analysis. These results are summarized in Table 4

(a) The best solution obtained by Mushroom dataset (fitness=2.96536).
Nr. Stories
1 at14_s at8_c at20_p at5_t at13_s at9_b at17_p at7_f at1_p
2 at6_n at11_t at18_w at8_c at19_o at20_p at5_t at13_s at1_e
(b) The best solution obtained by Iris dataset (fitness=4.75778).
Nr. Stories
1 sepallength_(7-inf) petallength_(5.425-inf) Iris-virginica_Iris-virginica
2 petallength_(2.475-3.95] petalwidth_(0.7-1.3] Iris-versicolor_Iris-versicolor
3 sepalwidth_(3.2-3.8] petallength_(-inf-2.475] petalwidth_(-inf-0.7] sepallength_(-inf-5.2] Iris-setosa_Iris-setosa
4 sepalwidth_(2.6-3.2] sepallength_(-inf-5.2] petalwidth_(-inf-0.7] petallength_(-inf-2.475] Iris-setosa_Iris-setosa
(c) The best solution obtained by Abalone dataset (fitness=2.76557).
Nr. Stories
1 length_(-inf-0.26] diameter_(-inf-0.20375] viscera_(-inf-0.190375] height_(-inf-0.2825] shell_(-inf-0.252375] whole_(-inf-0.707875] shucked_(-inf-0.37275] rings_(-inf-8]
2 viscera_(0.38025-0.570125] whole_(1.41375-2.119625] diameter_(0.50125-inf) length_(0.63-inf) shell_(0.252375-0.50325] shucked_(0.37275-0.7445] viscera_(0.190375-0.38025] height_(-inf-0.2825] rings_(8-15]
(d) The best solution obtained by Wine dataset (fitness=4.7319).
Nr. Stories
1 color_intensity_(7.14-10.07] flavanoids_(-inf-1.525] total_phenols_(-inf-1.705] hue_(-inf-0.7875] ash_(2.295-2.7625] alcohol_(12.93-13.88] total_phenols_(2.43-3.155] flavanoids_(2.71-3.895] color_intensity_(4.21-7.14] malic_acid_(-inf-2.005] hue_(0.7875-1.095] class_2
2 total_phenols_(1.705-2.43] color_intensity_(-inf-4.21] ash_(1.8275-2.295] hue_(0.7875-1.095] nonflavanoid_phenols_(0.2625-0.395] total_phenols_(2.43-3.155] malic_acid_(-inf-2.005] alcalinity_(15.45-20.3] flavanoids_(-inf-1.525] hue_(-inf-0.7875] class_3
3 alcohol_(-inf-11.98] color_intensity_(-inf-4.21] flavanoids_(1.525-2.71] nonflavanoid_phenols_(0.395-0.5275] alcalinity_(20.3-25.15] magnesium_(-inf-93] alcohol_(11.98-12.93] flavanoids_(-inf-1.525] total_phenols_(-inf-1.705] hue_(-inf-0.7875] class_3
4 alcalinity_(-inf-15.45] class_1
(e) The best solution obtained by Sport dataset (fitness=4.87292).
Nr. Stories
Table 4: The best metro maps obtained by automated construction using EA.

that is divided into five sub-tables denoted Table 4-4, where each sub-table presents the best metro map obtained by the optimization of the particular dataset.

The best metro maps in the tables are illustrated as a sequences of implications for each metro line, where the first attribute represents the starting and the last the final metro stop of the definite metro line. Let us mention that both these attributes are denoted in bold case in the tables. On the other hand, all the other attributes in the sequence (i.e. intermediate metro stops) that obviously must be true according to Eq. (16), if the final metro stop should be achieved, represents the path to the goal. The intermediate metro stops, appearing simultaneously on different metro lines, describes the interrelations, and indicate that attributes (i.e. species, objects, indicators, etc.) constructing the metro lines are non-separable.

4.4 Metro map visualization

In this subsection, the best results generated by the EA for ARM information are visualized and then the salient pieces of information (i.e. attributes) are integrated into metro lines and subsequently into the whole story. From the story point of view, the attributes of metro lines that are connected between each other with implication relations, represent the plot of the story. The relations are denoted by an arrow that also indicates a direction of the story plot. However, conflicts caused by the interrelation of metro stops of different metro lines are designated by vertical arcs.

Let us emphasize that the metro map of ARM information presents a generalization of all the association rules in the transaction database and, therefore, ignores the special cases. Consequently, the extracted association rule sequences in metro lines only highlights the most general truths. As a result, it can happen that some conclusion does not hold true for each cases using this method. Despite everything, the method offers a new aspect in the visualization of ARM rules and can help users direct their attention on the more salient pieces of information.

This study is focused on the visualization of the best result obtained by constructing the metro map of ARM information based of the five previously mentioned datasets. These results of the visualization of metro maps as well as exposition of stories they tell are illustrated in the remainder of the paper.

4.4.1 Mushroom dataset

Visualization of the best run obtained by the construction of a metro map based on ARM information of the Mushroom dataset is presented in Fig. 2, from which it can be seen that the metro map consists of two metro lines, where the first one describes the sequence of association rules classifying the poisonous mushrooms, while the second the same for the edible ones.

Figure 2: Visualization of the Mushroom dataset.

Interestingly, the metro line classifying the poisonous mushrooms started with ’stalk-surface-above-ring small’ denoted as attribute ’att14_s’, and the edible with the ’odor none’ one denoted as ’att6_n’. In the metro map, there exists one interrelated attribute (i.e. ’gill-spacing close’ as ’att5_s’), and a sequence of three attributes (i.e. ’ring-type pendent’ as ’att20_p’, ’bruises no’ as ’att5_n’, and ’stalk-surface-above-ring silky’ as ’att13_s’. Obviously, these attributes do not determine the poisonous of the specific mushrooms, but highlight the fact that the dataset is non-separable.

Mushroom story. Logical rules for mushrooms (i.e. definite attributes set to true) that are common for both classes are not decisive for establishing the level of poisonous. The problem presents those attributes that can only be found in the poisonous class. This means that when a mushroom satisfies the following three logical rules, i.e. (1) ’gill-size broad’ (i.e. ’att9_b’), (2) ’veil-type partial’ (i.e. ’att17_p’), and (3) ’gill-attachment free’ (i.e. ’att7_f’), then there is a high possibility that it is poisonous.

4.4.2 Iris dataset

A visualization of the best results obtained by the construction of a metro map based on ARM information within the Iris dataset is presented in Fig. 3, from which it can be seen that the corresponding metro map consist of four metro lines. Indeed, the first two metro lines are captured for classifying the ’Iris virginica’ and ’Iris versicolor’ plants, while the other two belong to ’Iris setosa’.

Figure 3: Visualization of Iris dataset.

Although there are not any interrelations between particular classes of iris plant, these can be found between two metro lines classifying the same ’Iris setosa’ plant (metro lines 3 and 4). Although a metro map for classifying the ’Iris setosa’ plant proposes two different metro lines, it turns out that the fourth metro line is actually an inverse of the third (i.e. the attributes ’petallength_(-inf-2.475]’, ’petalwidth_(-inf-0.7]’, and ’sepallength_(-inf-5.2]’) and that both metro lines indeed only have different final metro stops (i.e. ’sepalwidth_(3.2-3.8]’ and ’sepalwidth_(2.6-3.2])’. This means that the ordering of attributes is invariant for this dataset on the one hand, and that the same iris plant can be classified by different final metro stops and the same three attributes.

Iris story. Actually, the Iris dataset contains two clusters: (1) one containing ’Iris setosa’, and (2) the other containing ’Iris virginica’ and ’Iris versicolor’. Based on this assumption, the first two metro lines classifying the second cluster should be interrelated, but surprisingly this is not the case. Moreover, it turns out that when the species information are used as proposed by Fisher [24], all three classes are linearly separable [25]. Interestingly, the advantage of our algorithm for generating metro maps of ARM information is that it already considers these clusters as linearly independent by exposing the summarized information hidden in the data.

4.4.3 Abalone dataset

A visualization of the best metro map obtained with the proposed EA by analysis of ARM information within Abalone dataset are illustrated in Fig. 4. This metro map consists of two metro lines classifying the age of abalone ear shells according to the number of wings into ’rings_(-inf-8]’ and ’rings_(8-15]’ classes. The former has less or equal than 8 rings and starts with a length less than or equal to 0.26 mm, while the latter is higher than 8, but lower than 16 rings and starts with a viscera weight between 0.38025 and 0.570125 grams.

Figure 4: Visualization of Abalone dataset.

The metro lines have only one interrelation, indicating that both classes of species share a similar height, i.e. less than or equal to 0.2825 mm with meat in shell.

Abalone story. Here, two metro lines narrate their aspects of story very independently. However, the point of the story can be summarized as follows: Although abalone species can have similar heights, the other attributes of metro lines clearly predestine they in one of the class. For instance, while the admissible length of abalone plants for the first class is less or equal to 0.26 mm, the same length for the second class is higher than 0.63 mm (i.e. much longer).

4.4.4 Wine dataset

Visualizing a metro map that emerged on basis of ARM information within a Wine dataset is depicted in Fig. 5. As can be seen from the figure, the metro map consists of even four metro lines, where each of these describes a sequence of attributes that must be satisfied for classifying the test sample in one of three classes (i.e. ’class_1’, ’class_2’, and ’class_3’) denoting three types of wines.

Figure 5: Visualization of Wine dataset.

As can be seen from the figure, all four metro lines are independent of one another, because there are only three interrelations between the first three metro lines. There, interrelations refer to content of chemical elements: flavanoids less than or equal to 1.525 mg/100g, total phenols less than or equal to 1.705 mg/100g and hue less than or equal to 0.7875mg/100g.

Wine story. This story is undoubtedly one of the most fascinating, while the point of the story confirms the power of selected visualization. The fact is that these classes are separable, but only Regularized Discriminant Analysis (RDA), proposed by Friedman [26], achieved a 100 % correct classification.

The story reveals another aspect of the situation. Obviously, the ’class_1’ is strongly separable as can be seen from metro line four. The more interesting are the remaining two classes, i.e. ’class_2’ and ’class_3’, where the interrelations between these could be signified the start of evolving the new wine class.

Without any knowledge about the real situation that happened in a specific region in Italy, our story speculates that there was a wine of ’class_3’, which evolved to some step and then became a recognisable sort of wine. At this point, the sort of wine from the same cultivar ’class_2’ starts to evolve on basis of the high color intensity (i.e. between 7.14 and 10.07) and minimum level of hue. The ’class_3’ in metro line 2 influenced this by maintaining the minimum level of total phenols, while the same class specified in metro line 1 by maintaining the minimum level of flavanoids. On the other hand, the ’class_2’ affects the existing ’class_3’ by maintaining the minimum level of total phenols, with which this sort of wine originated. Indeed, this interrelation is denoted in the observed metro lines with red metro stops at different positions and in some way violates rule of metro construction, where the same metro stops must be correlated in the metro lines, this kind of presentation was selected intentionally by authors in order to simplify narrating the story. The remainder of the ’class_2’ maturing proceeded independently by adding the original amounts of chemical elements that finally led to an origin of the new sort of wine. Indeed, this fact confirms an assumption about separability of all three classes as asserted by the RDA method.

4.4.5 Sport dataset

Visualization of the metro map obtained on the basis of ARM information hidden within Sport dataset is presented in Fig. 6, from which it can be seen that this map consists of five metro lines. Each metro line refers to one of the five final metro stops (i.e. attributes ’TYPE_INTER’, ’BEV_WATER’, ’CAL_MEDIUM’, ’REST_NO’, ’FOOD_FRUITS’) selected by a linear programming algorithm.

Figure 6: Visualization of Sport dataset.

Metro map has six interrelations, i.e. metro lines two and three are interrelated by two, three and four with one, and four and five with three connections. The most interrelations are induced through metro stop ’HR_LOW’ denoted as low hard rate. The metro line one is separable.

Sport story. The aspects of stories shown by the specific metro line are straightforward. Therefore, here, we only describe the first one: This tells that the interval training must be performed under a high heart rate. Let us mention that this assertion complies strongly with the theory of sport training. Obviously, the other metro lines introduce demands for athletes by overcoming the easier stresses.

In general, the goals of metro map analysis depend of the selected final metro stops. In line with this, a sport trainer has an opportunity to select own final metro stops instead of the linear programming, and thus obtains the full control over selecting the goals for analysis and, consequently, over the construction of a specific metro map.

5 Conclusion

Nowadays, we are confronted with the large-scale creation of unstructured data that is hard to analyze manually. In line with this, a lot of data mining methods have arisen with the purpose of discovering new information hidden within data. One of these methods is also ARM, devoted to discovering the interesting relations between attributes in huge transaction databases. Normally, the algorithms for ARM generate a huge number of association rules collected in datasets in an unstructured form. From these datasets, it is not so easy to extract structured knowledge and present this in a form that is automatically appropriate for ordinary users.

A metro map of information is a new visualization method that has taken inspiration from real metro maps. Metaphorically, like real metro maps help people understand their surroundings, the metro maps help them understand their information landscape. Moreover, the structure of metro maps is designed so that these can even tell stories to users. Indeed, metro maps consist of metro lines that present different aspects of the same story. The metro lines contain metro stops that represent salient pieces of information. When these pieces interrelate between different metro lines, conflicts arise. These conflicts lead the plot of the story towards its resolution (the final metro stop). On the other hand, each story must have its own introduction (the final metro stop).

This paper proposes a new method for automatically creating metro maps of ARM information (also information cartography) that consist of five steps: creating an ARM dataset, association rule simplification, attribute graph definition, metro map construction, and metro map visualization.

The core of information cartography in ARM presents the construction of a metro map, which is defined as an optimization problem that has been solved by an EA. The task of the EA is to find the best paths (representing metro lines) within the attribute graph according to predefined objectives. Actually, these objectives define a structure of the designed metro map. The best metro map according to the value of the fitness function are visualized at the end.

Thus, the goal of the visualization process is not only to represent information to the user in an understandable way, but to assemble the constituent salient information pieces of the metro map into the whole and, on this basis, to form the story with its introduction, plot of the play and conflicts that lead to the final resolution. Indeed, the story tells users even more than many other standard visualizations can be revealed by our wide experimental work. Moreover, this technology is capable of directing the user’s decision-making process and simulate exactly what consequences false decisions can have.

However, as the stories describe specific situations, our metro maps of ARM information also stem from generalized extracted information that in specific situations sometimes does not hold up entirely. Despite this weakness, the technology of information cartography in ARM shows that there are new aspects in extracting the structured knowledge hidden in data and, especially, in transferring this knowledge to the user.

In the future, the information cartography could be broadened to also include other UCI Machine Learning datasets. Particularly, its application to the sports domain could be very interesting, where this technology could be used for Interactive Machine Learning (iML) and thus help sport athletes optimize their learning behavior during a sport training session through interaction with the proposed information cartography.


  • [1] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB ’94, page 487–499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.
  • [2] Xindong Wu, Vipin Kumar, J Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J McLachlan, Angus Ng, Bing Liu, S Yu Philip, et al. Top 10 algorithms in data mining. Knowledge and information systems, 14(1):1–37, 2008.
  • [3] Mohammed J Zaki. Scalable algorithms for association mining. Knowledge and Data Engineering, IEEE Transactions on, 12(3):372–390, 2000.
  • [4] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In ACM Sigmod Record, volume 29, pages 1–12. ACM, 2000.
  • [5] Jacinto Mata, José-Luis Alvarez, and José-Cristobal Riquelme. Discovering numeric association rules via evolutionary algorithm. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 40–51. Springer, 2002.
  • [6] Bilal Alatas, Erhan Akin, and Ali Karci. Modenar: Multi-objective differential evolution algorithm for mining numeric association rules. Applied Soft Computing, 8(1):646–656, 2008.
  • [7] Iztok Fister, Iztok Fister Jr., and Dušan Fister. BatMiner for Identifying the Characteristics of Athletes in Training, pages 201–221. Springer International Publishing, Cham, 2019.
  • [8] Dafna Shahaf, Carlos Guestrin, Eric Horvitz, and Jure Leskovec. A metro map can tell a story, as well as provide good directions. Communications of the ACM, 58(11):62–73, November 2015.
  • [9] Pak Chung Wong, Paul Whitney, and Jim Thomas. Visualizing association rules for text mining. In Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis’ 99), pages 120–123. IEEE, 1999.
  • [10] Heike Hofmann, Arno PJM Siebes, and Adalbert FX Wilhelm. Visualizing association rules with interactive mosaic plots. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 227–235, 2000.
  • [11] Hian-Huat Ong, Kok-Leong Ong, Wee-Keong Ng, and Ee Peng Lim. Crystalclear: Active visualization of association rules. In International Workshop on Active Mining AM 2002, in conjunction with the IEEE International Conference on Data Mining ICDM, 2002.
  • [12] Annalisa Appice and Paolo Buono. Analyzing multi-level spatial association rules through a graph-based visualization. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pages 448–458. Springer, 2005.
  • [13] Tutut Herawan, Iwan Tri Riyadi Yanto, and Mustafa Mat Deris. Smarviz: Soft maximal association rules visualization. In International Visual Informatics Conference, pages 664–674. Springer, 2009.
  • [14] Michael Hahsler and Sudheer Chelluboina. Visualizing association rules in hierarchical groups. In 42nd Symposium on the Interface: Statistical, Machine Learning, and Visualization Algorithms (Interface 2011). The Interface Foundation of North America. Citeseer, 2011.
  • [15] Baoqing Jiang, Chong Han, and Xiaohua Hu. A finite ranked poset and its application in visualization of association rules. In 2008 IEEE International Conference on Granular Computing, pages 322–325. IEEE, 2008.
  • [16] Iztok Fister Jr., Akemi Galvez, Eneko Osaba, Javier Del Ser, Andres Iglesias, and Iztok Fister. Discovering dependencies among mined association rules with population-based metaheuristics. In

    Proceedings of the Genetic and Evolutionary Computation Conference Companion

    , pages 1668–1674, 2019.
  • [17] Dafna Shahaf, Carlos Guestrin, and Eric Horvitz. Trains of thought: Generating information maps. In Proceedings of the 21st International Conference on World Wide Web, WWW ’12, page 899–908, New York, NY, USA, 2012. Association for Computing Machinery.
  • [18] Dafna Shahaf, Jaewon Yang, Caroline Suen, Jeff Jacobs, Heidi Wang, and Jure Leskovec. Information cartography: Creating zoomable, large-scale maps of information. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, page 1097–1105, New York, NY, USA, 2013. Association for Computing Machinery.
  • [19] Dafna Shahaf, Carlos Guestrin, and Eric Horvitz. Metro maps of science. In Qiang Yang, Deepak Agarwal, and Jian Pei, editors, KDD, pages 1122–1130. ACM, 2012.
  • [20] W. Harris. Writing & Selling Short Stories & Personal Essays: The Essential Guide to Getting Your Work Published. Writer’s Digest Books, Cincinnati, Ohio, US, 2017.
  • [21] A E Eiben and James E Smith. Introduction to Evolutionary Computing. Springer Publishing Company, Incorporated, 2nd edition, 2015.
  • [22] Rainer Storn and Kenneth Price.

    Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces.

    J. of Global Optimization, 11(4):341–359, dec 1997.
  • [23] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
  • [24] R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(7):179–188, 1936.
  • [25] A.N. Gorban, N.R. Sumner, and A.Y. Zinovyev. Topological grammars for data approximation. Applied Mathematics Letters, 20(4):382–386, Apr 2007.
  • [26] Jerome H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84(405):165–175, 1989.