Comparing Apples and Oranges: Measuring Differences between Data Mining Results

by   Nikolaj Tatti, et al.

Deciding whether the results of two different mining algorithms provide significantly different information is an important open problem in exploratory data mining. Whether the goal is to select the most informative result for analysis, or decide which mining approach will likely provide the most novel insight, it is essential that we can tell how different the information is that two results provide. In this paper we take a first step towards comparing exploratory results on binary data. We propose to meaningfully convert results into sets of noisy tiles, and compare between these sets by Maximum Entropy modelling and Kullback-Leibler divergence. The measure we construct this way is flexible, and allows us to naturally include background knowledge, such that differences in results can be measured from the perspective of what a user already knows. Furthermore, adding to its interpretability, it coincides with Jaccard dissimilarity when we only consider exact tiles. Our approach provides a means to study and tell differences between results of different data mining methods. As an application, we show that it can also be used to identify which parts of results best redescribe other results. Experimental evaluation shows our measure gives meaningful results, correctly identifies methods that are similar in nature, and automatically provides sound redescriptions of results.


page 1

page 2

page 3

page 4


Comparing Apples and Oranges: Measuring Differences between Exploratory Data Mining Results

Deciding whether the results of two different mining algorithms provide ...

Using Background Knowledge to Rank Itemsets

Assessing the quality of discovered results is an important open problem...

Analysis of corporate environmental reports using statistical techniques and data mining

Measuring the effectiveness of corporate environmental reports, it being...

Identifying user habits through data mining on call data records

In this paper we propose a framework for identifying patterns and regula...

Beyond Roll-Up's and Drill-Down's: An Intentional Analytics Model to Reinvent OLAP (long-version)

This paper structures a novel vision for OLAP by fundamentally redefinin...

Darknet Data Mining – A Canadian Cyber-crime Perspective

Exploring the darknet can be a daunting task; this paper explores the ap...

How to Recognize Actionable Static Code Warnings (Using Linear SVMs)

Static code warning tools often generate warnings that programmers ignor...

Please sign up or login with your details

Forgot password? Click here to reset