Dirichlet Bayesian Network Scores and the Maximum Entropy Principle
A classic approach for learning Bayesian networks from data is to select the maximum a posteriori (MAP) network. In the case of discrete Bayesian networks, the MAP network is selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al. (1995). The key properties of BDeu arise from its underlying uniform prior, which makes structure learning computationally efficient; does not require the elicitation of prior knowledge from experts; and satisfies score equivalence. In this paper we will discuss the impact of this uniform prior on structure learning from an information theoretic perspective, showing how BDeu may violate the maximum entropy principle when applied to sparse data and how it may also be problematic from a Bayesian model selection perspective. On the other hand, the BDs score proposed in Scutari (2016) arises from a piecewise prior and it does not appear to violate the maximum entropy principle, even though it is asymptotically equivalent to BDeu.
READ FULL TEXT