Explicit probabilistic models for databases and networks

06/29/2009
by   Tijl De Bie, et al.
0

Recent work in data mining and related areas has highlighted the importance of the statistical assessment of data mining results. Crucial to this endeavour is the choice of a non-trivial null model for the data, to which the found patterns can be contrasted. The most influential null models proposed so far are defined in terms of invariants of the null distribution. Such null models can be used by computation intensive randomization approaches in estimating the statistical significance of data mining results. Here, we introduce a methodology to construct non-trivial probabilistic models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt models allow for the natural incorporation of prior information. Furthermore, they satisfy a number of desirable properties of previously introduced randomization approaches. Lastly, they also have the benefit that they can be represented explicitly. We argue that our approach can be used for a variety of data types. However, for concreteness, we have chosen to demonstrate it in particular for databases and networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2022

Spatiotemporal Data Mining: A Survey

Spatiotemporal data mining aims to discover interesting, useful but non-...
research
06/16/2020

Tell Me Something I Don't Know: Randomization Strategies for Iterative Data Mining

There is a wide variety of data mining methods available, and it is gene...
research
03/22/2017

Randomizing growing networks with a time-respecting null model

Complex networks are often used to represent systems that are not static...
research
05/12/2017

Proof Mining with Dependent Types

Several approaches exist to data-mining big corpora of formal proofs. So...
research
08/28/2019

Improving a State-of-the-Art Heuristic for the Minimum Latency Problem with Data Mining

Recently, hybrid metaheuristics have become a trend in operations resear...
research
10/11/2018

The Statistical Physics of Real-World Networks

Statistical physics is the natural framework to model complex networks. ...
research
06/11/2018

Randomized reference models for temporal networks

Many real-world dynamical systems can successfully be analyzed using the...

Please sign up or login with your details

Forgot password? Click here to reset