Learning Bayesian Networks from Big Data with Greedy Search: Computational Complexity and Efficient Implementation

04/22/2018
by   Marco Scutari, et al.
0

Learning the structure of Bayesian networks from data is known to be a computationally challenging, NP-hard problem. The literature has long investigated how to perform structure learning from data containing large numbers of variables, following a general interest in high-dimensional applications ("small n, large p") in systems biology and genetics. More recently, data sets with large numbers of observations (the so-called "big data") have become increasingly common; and many of these data sets are not high-dimensional, having only a few tens of variables. We revisit the computational complexity of Bayesian network structure learning in this setting, showing that the common choice of measuring it with the number of estimated local distributions leads to unrealistic time complexity estimates for the most common class of score-based algorithms, greedy search. We then derive more accurate expressions under common distributional assumptions. These expressions suggest that the speed of Bayesian network learning can be improved by taking advantage of the availability of closed form estimators for local distributions with few parents. Furthermore, we find that using predictive instead of in-sample goodness-of-fit scores improves both speed and accuracy at the same time. We demonstrate these results on large real-world environmental data and on reference data sets available from public repositories.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2022

Using Mixed-Effect Models to Learn Bayesian Networks from Related Data Sets

We commonly assume that data are a homogeneous set of observations when ...
research
10/16/2012

Local Structure Discovery in Bayesian Networks

Learning a Bayesian network structure from data is an NP-hard problem an...
research
06/30/2014

Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimised Implementations in the bnlearn R Package

It is well known in the literature that the problem of learning the stru...
research
06/08/2020

Approximate learning of high dimensional Bayesian network structures via pruning of Candidate Parent Sets

Score-based algorithms that learn Bayesian Network (BN) structures provi...
research
06/27/2012

Smoothness and Structure Learning by Proxy

As data sets grow in size, the ability of learning methods to find struc...
research
08/10/2021

A hydraulic model outperforms work-balance models for predicting recovery kinetics from intermittent exercise

Data Science advances in sports commonly involve "big data", i.e., large...
research
02/19/2022

Parallel Sampling for Efficient High-dimensional Bayesian Network Structure Learning

Score-based algorithms that learn the structure of Bayesian networks can...

Please sign up or login with your details

Forgot password? Click here to reset