Maximum Entropy Based Significance of Itemsets

04/24/2019
by   Nikolaj Tatti, et al.
0

We consider the problem of defining the significance of an itemset. We say that the itemset is significant if we are surprised by its frequency when compared to the frequencies of its sub-itemsets. In other words, we estimate the frequency of the itemset from the frequencies of its sub-itemsets and compute the deviation between the real value and the estimate. For the estimation we use Maximum Entropy and for measuring the deviation we use Kullback-Leibler divergence. A major advantage compared to the previous methods is that we are able to use richer models whereas the previous approaches only measure the deviation from the independence model. We show that our measure of significance goes to zero for derivable itemsets and that we can use the rank as a statistical test. Our empirical results demonstrate that for our real datasets the independence assumption is too strong but applying more flexible models leads to good results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

The Entropy Method in Large Deviation Theory

This paper illustrates the power of the entropy method in addressing pro...
research
04/09/2022

Moment estimates in the first Borel-Cantelli Lemma with applications to mean deviation frequencies

We quantify the elementary Borel-Cantelli Lemma by higher moments of the...
research
07/09/2018

Process Monitoring Using Maximum Sequence Divergence

Process Monitoring involves tracking a system's behaviors, evaluating th...
research
07/11/2017

On the letter frequencies and entropy of written Marathi

We carry out a comprehensive analysis of letter frequencies in contempor...
research
10/01/2015

Similarity of symbol frequency distributions with heavy tails

Quantifying the similarity between symbolic sequences is a traditional p...
research
02/08/2019

Using Background Knowledge to Rank Itemsets

Assessing the quality of discovered results is an important open problem...
research
03/27/2013

Decisions with Limited Observations over a Finite Product Space: the Klir Effect

Probability estimation by maximum entropy reconstruction of an initial r...

Please sign up or login with your details

Forgot password? Click here to reset