SPFlow: An Easy and Extensible Library for Deep Probabilistic Learning using Sum-Product Networks

01/11/2019 ∙ by Alejandro Molina, et al. ∙ University of Waterloo University of Cambridge Max Planck Society University of Bari Aldo Moro Technische Universität Darmstadt 0

We introduce SPFlow, an open-source Python library providing a simple interface to inference, learning and manipulation routines for deep and tractable probabilistic models called Sum-Product Networks (SPNs). The library allows one to quickly create SPNs both from data and through a domain specific language (DSL). It efficiently implements several probabilistic inference routines like computing marginals, conditionals and (approximate) most probable explanations (MPEs) along with sampling as well as utilities for serializing, plotting and structure statistics on an SPN. Moreover, many of the algorithms proposed in the literature to learn the structure and parameters of SPNs are readily available in SPFlow. Furthermore, SPFlow is extremely extensible and customizable, allowing users to promptly distill new inference and learning routines by injecting custom code into a lightweight functional-oriented API framework. This is achieved in SPFlow by keeping an internal Python representation of the graph structure that also enables practical compilation of an SPN into a TensorFlow graph, C, CUDA or FPGA custom code, significantly speeding-up computations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent years have seen a significant interest in tractable probabilistic representations that allow exact inference on highly expressive models in polynomial time, thus overcoming the shortcomings of classical probabilistic graphical models [1]. In particular, Sum-Product Networks (SPNs) [2], deep probabilistic models augmenting arithmetic circuits (ACs) [3] with a latent variable interpretation [4, 5]

have been successfully employed as state-of-the-art models in many application domains such as computer vision 

[6, 7], speech [8]

, natural language processing 

[9, 10], and robotics [11].

Here, we introduce SPFlow111The latest source code and documentation are available at: https://github.com/SPFlow, a python library intended to enable researchers on probabilistic modeling to promptly leverage efficiently implemented SPN inference routines, while at the same time allowing them to easily adapt and extend the algorithms available in the literature on SPNs. In the following, we briefly review SPNs and some of the functions available in SPFlow. We also present a small example on how to extend the library.

Figure 1: An example of a valid SPN. Here, , and

are random variables. The structure represents the joint distribution

.

2 Sum-Product Networks

As illustrated in Fig. 1, an SPN is a rooted directed acyclic graph, comprising sum, product or leaf

nodes. The scope of an SPN is the set of random variables appearing on the network. An SPN can be defined recursively as follows: (1) a tractable univariate distribution is an SPN; (2) a product of SPNs defined over different scopes is an SPN; and (3), a convex combination of SPNs over the same scope is an SPN. Thus, a product node in an SPN represents a factorization over independent distributions defined over different random variables, while a sum node stands for a mixture of distributions defined over the same variables. From this definition, it follows that the joint distribution modeled by such an SPN is a valid probability distribution, i.e., each complete and partial evidence inference query produces a consistent probability value 

[2, 12].

To answer probabilistic queries in an SPN, we evaluate the nodes starting at the leaves. Given some evidence, the probability output of querying leaf distributions is propagated bottom up following the respective operations. To compute marginals, i.e., the probability of partial configurations, we set the probability at the leaves for those variables to and then proceed as before. To compute MPE states, we replace sum by max nodes and then evaluate the graph first with a bottom-up pass, but instead of weighted sums, we pass along the weighted maximum value. Finally, in a top-down pass, we select the paths that lead to the maximum value, finding approximate MPE states [2]. All these operations traverse the tree at most twice and therefore can be achieved in linear time w.r.t. the size of the SPN.

3 An Overview of the SPFlow Library

As most operations on SPNs are based on traversing the graph in a bottom-up or top-down fashion, we model the library as basic node structures and generic traversal operations on them. The rest of the SPN operations are then implemented as lambda functions that rely on the generic operations.

Therefore, the SPFlow library puts the graph structure at the center. All other operations receive or produce a graph that can be then used by the other operations. This increases the flexibility and potential uses. As an example, one can create a structure using different algorithms and then save it to disk. Later on, one can load it again and do parameter optimization using, e.g., TensorFlow [13], and then do inference to answer probabilistic queries. All those operations can be composed as they rely only on the given structure. More specifically, the functionality of SPFlow covers:

Modeling Evaluation Sampling Other
  • [noitemsep,topsep=0pt, leftmargin=*]

  • Domain specific
    language (DSL)

  • Structure Learning

  • Random Structures

  • Checks for
    consistency and
    completeness

  • [noitemsep,topsep=0pt, leftmargin=*]

  • Joint queries

  • Marginal queries

  • Approximate most probable
    explanation (MPE)

  • Parameter
    Optimization

  • [noitemsep,topsep=0pt, leftmargin=*]

  • Ancestral Sampling

  • Conditional Sampling

  • [noitemsep,topsep=0pt, leftmargin=*]

  • Plotting

  • JSON

  • Text Formats

  • Standard Pickling

  • Tensorflow Graphs

  • Convert to C code

4 SPFlow Programming Examples

To create the SPN shown already in Fig. 1, one simply writes the follow code after loading the library:

spn = 0.4 * (Categorical(p=[0.2, 0.8], scope=0) *
             (0.3 * (Categorical(p=[0.3, 0.7], scope=1) *
                     Categorical(p=[0.4, 0.6], scope=2))
            + 0.7 * (Categorical(p=[0.5, 0.5], scope=1) *
                     Categorical(p=[0.6, 0.4], scope=2))))
    + 0.6 * (Categorical(p=[0.2, 0.8], scope=0) *
             Categorical(p=[0.3, 0.7], scope=1) *
             Categorical(p=[0.4, 0.6], scope=2))

Alternatively, we can create the same spn using the following code:

p0 = Product(children=[Categorical(p=[0.3, 0.7], scope=1), Categorical(p=[0.4, 0.6], scope=2)])
p1 = Product(children=[Categorical(p=[0.5, 0.5], scope=1), Categorical(p=[0.6, 0.4], scope=2)])
s1 = Sum(weights=[0.3, 0.7], children=[p0, p1])
p2 = Product(children=[Categorical(p=[0.2, 0.8], scope=0), s1])
p3 = Product(children=[Categorical(p=[0.2, 0.8], scope=0), Categorical(p=[0.3, 0.7], scope=1)])
p4 = Product(children=[p3, Categorical(p=[0.4, 0.6], scope=2)])
spn = Sum(weights=[0.4, 0.6], children=[p2, p4])
assign_ids(spn)
rebuild_scopes_bottom_up(spn)

Actually, Fig. 1 itself was plotted by calling plot_spn(spn, ’basicspn.pdf’). To evaluate the likelihood of the SPN on some data, one can use Python or TensorFlow:

test_data = np.array([1.0, 0.0, 1.0]).reshape(-1, 3)
log_likelihood(spn, test_data) % [[-1.90730501]]
eval_tf(spn, test_data) % [[-1.90730501]]

To learn the parameters of the SPN using TensorFlow, one calls

optimized_spn = optimize_tf(spn, test_data)
log_likelihood(optimized_spn, test_data) % [[-1.38152628]]

Marginal likelihoods just require setting ”nan” on the features to be marginalized:

log_likelihood(spn, np.array([1, 0, np.nan]).reshape(-1, 3) % [[-1.2559681]]

Sampling creates instances where samples are obtained for the cells that contain ”nan”:

sample_instances(spn, np.array([np.nan, 0, 0]).reshape(-1, 3), RandomState(123))
%[[1. 0. 0.]]

To learn the structure of an SPN, say for binary classification, let us first create a 2D dataset with a binary label. An instance has label 0, when the features are close to the generating Gaussian with mean 5. It has label 1, when the features are closer to the generating Gaussian with a mean of 15:

train_data = np.c_[np.r_[np.random.normal(5, 1, (500, 2)),
                        np.random.normal(15, 1, (500, 2))],
                   np.r_[np.zeros((500, 1)), np.ones((500, 1))]]

Now we specify the statistical types of the random variables and learn a SPN classifier:

ds_context = Context(parametric_type=[Gaussian, Gaussian, Categorical])
ds_context.add_domains(train_data)
spn = learn_classifier(train_data,ds_context,learn_parametric, 2)

Doing MPE on the classification SPN gives us the classifications.

mpe(spn, np.array([3.0, 4.0, np.nan, 12.0, 18.0, np.nan]).reshape(-1, 3))
% [[ 3.  4.  0.]
%  [12. 18.  1.]]

The third column is the label and we can see that it behaves as expected in this synthetic example.

5 Extending the SPFlow library

To illustrate the flexibility of SPflow, we show how to extend inference to other leave types. Here we implement the Pareto leaf distribution. It relies on the infrastructure already present.

class Pareto(Leaf):
    def __init__(self, a, scope=None):
        Leaf.__init__(self, scope=scope)
        self.a = a
def pareto_likelihood(node, data, dtype=np.float64):
    probs = np.ones((data.shape[0], 1), dtype=dtype)
    from scipy.stats import pareto
    probs[:] = pareto.pdf(data[:, node.scope], node.a)
    return probs
add_node_likelihood(Pareto, pareto_likelihood)
spn =  0.3 * Pareto(2.0, scope=0) + 0.7 * Pareto(3.0, scope=0)
log_likelihood(spn, np.array([1.5]).reshape(-1, 1))
%[[-0.52324814]]

The same kind of extensions are possible for all other operations. This way it is easy to extend the library, by adding new nodes or even new operations. For instance, one could easily interface probabilistic programming languages and tools such as PyMC or Pyro.

Acknowledgements. RP acknowledges support from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 797223 — HYBSPN. This work has benefited from the DFG project CAML (KE 1686/3-1), as part of the SPP 1999, and from the BMBF project MADESI (01IS18043B).


References

  • [1] Daphne Koller and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
  • [2] Hoifung Poon and Pedro Domingos. Sum-Product Networks: a New Deep Architecture. Proc. of UAI, 2011.
  • [3] Adnan Darwiche.

    A differential approach to inference in bayesian networks.

    J.ACM, 2003.
  • [4] Arthur Choi and Adnan Darwiche. On relaxing determinism in arithmetic circuits. In Proceedings of ICML, pages 825–833, 2017.
  • [5] Robert Peharz, Robert Gens, Franz Pernkopf, and Pedro M. Domingos. On the latent variable interpretation in sum-product networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP, Issue 99, 2016.
  • [6] Robert Gens and Pedro Domingos. Discriminative Learning of Sum-Product Networks. In Advances in Neural Information Processing Systems 25, pages 3239–3247, 2012.
  • [7] Mohamed Amer and Sinisa Todorovic. Sum product networks for activity recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015.
  • [8] Matthias Zohrer, Robert Peharz, and Franz Pernkopf. Representation learning for single-channel source separation and bandwidth extension. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 23(12):2398–2409, 2015.
  • [9] Wei-Chen Cheng, Stanley Kok, Hoai Vu Pham, Hai Leong Chieu, and Kian Ming Adam Chai. Language modeling with Sum-Product Networks. In INTERSPEECH 2014, pages 2098–2102, 2014.
  • [10] Alejandro Molina, Sriraam Natarajan, and Kristian Kersting.

    Poisson sum-product networks: A deep architecture for tractable multivariate poisson distributions.

    In Proc. of AAAI, 2017.
  • [11] Andrzej Pronobis, Francesco Riccio, and Rajesh PN Rao. Deep spatial affordance hierarchy: Spatial knowledge representation for planning in large-scale environments. In ICAPS 2017 Workshop on Planning and Robotics, Pittsburgh, PA, USA, 2017.
  • [12] Robert Peharz, Sebastian Tschiatschek, Franz Pernkopf, and Pedro Domingos. On theoretical properties of sum-product networks. In Proc. of AISTATS, 2015.
  • [13] Martín Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.