Disentangling Interpretable Generative Parameters of Random and Real-World Graphs

10/12/2019 ∙ by Niklas Stoehr, et al. ∙ Microsoft UCL 22

While a wide range of interpretable generative procedures for graphs exist, matching observed graph topologies with such procedures and choices for its parameters remains an open problem. Devising generative models that closely reproduce real-world graphs requires domain knowledge and time-consuming simulation. While existing deep learning approaches rely on less manual modelling, they offer little interpretability. This work approaches graph generation (decoding) as the inverse of graph compression (encoding). We show that in a disentanglement-focused deep autoencoding framework, specifically Beta-Variational Autoencoders (Beta-VAE), choices of generative procedures and their parameters arise naturally in the latent space. Our model is capable of learning disentangled, interpretable latent variables that represent the generative parameters of procedurally generated random graphs and real-world graphs. The degree of disentanglement is quantitatively measured using the Mutual Information Gap (MIG). When training our Beta-VAE model on ER random graphs, its latent variables have a near one-to-one mapping to the ER random graph parameters n and p. We deploy the model to analyse the correlation between graph topology and node attributes measuring their mutual dependence without handpicking topological properties.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Motivation and Related Work

Conventional network analysis aims at finding interpretable models that explain interaction dynamics by examining graphs as discrete objects Barabási and Pósfai (2016). Random graph generator models Chakrabarti and Faloutsos (2006) like Erdős–Rényi random graphs (ER graphs) Erdös and Rényi (1959) are usually too generic to accurately represent the versatile linking patterns of real-world graphs Chakrabarti and Faloutsos (2006); Leskovec et al. (2010); Dorogovtsev and Mendes (2002). Devising models that reproduce characteristic topologies prevalent in social Leskovec et al. (2007b), biological Vazquez et al. (2003), internet Zhou and Mondragón (2004) or document Menczer (2004) graphs typically requires a thorough understanding of the domain and time-consuming graph simulations, thereby imposing strong assumptions and modelling bias. Recently, deep learning on non-euclidean data such as graphs has received substantial attention Bronstein et al. (2017). As these techniques require little or no explicit modelling and capture complex graph structure Brugere et al. (2018); Wang et al. , we propose to use them as a tool to obtain interpretable generative parameters of graphs. As a limiting factor, most existing models generate graphs sequentially based on concatenations of node embeddings. These are not only non-interpretable but also impose an artificial node ordering instead of considering a global representation of the entire graph Johnson (2017); You et al. (2018); Li et al. (2018); Liu et al. ; Li et al. (2016); Simonovsky and Komodakis (2018). DisenGCN Ma et al. focuses on interpretability, but is limited to node-level linking mechanisms. The latent space of NetGAN Bojchevski et al. (2018) reveals topological properties instead of generative parameters. Some recent works on interpretable graph embeddings Wang et al. ; Grover and Leskovec (2016); Cao et al. ; Perozzi et al. (2016); Noutahi et al. (2019) provide visualizations for inspection, but no parameters suitable for a generative model.

In other domains, interest in model interpretability has caused a focus on the latent space of neural models Navlakha (2017). Intuitively, the aim is to shape the latent space such that the euclidean distance between the latent representations of two data points corresponds to a “distance” between the actual data points Bengio et al. (2013)

. Latent variables describe probability distributions over the latent space. The goal of latent variable disentanglement can be understood as wanting to use each latent variable to encode one and only one data property in a one-to-one mapping

Chen et al. (2018), making the latent space more interpretable. Varying one latent variable should then correspond to a change in one observable factor of variation in the data, while other factors remain invariant Bengio et al. (2013). Most work in this field has been focused on visual and sequential data Chen et al. (2018, 2016); Kim and Mnih (2018); Stühmer et al. (2019).

Figure 1: Architecture Overview: We seek a continuous function mapping the disentangled latent variables into mutually independent, interpretable generative parameters .

Contributions

We assume that graphs are generated by superposition of interpretable, generative procedures parameterized by generative parameters such as and in ER graphs. We hypothesize that these generative parameters can be encoded by a minimal set of disentangled latent variables

in an unsupervised machine learning model. To this end, we apply the idea of

-Variational Autoencoders (-VAE) Higgins et al. (2017) in the context of graphs. Intuitively, our autoencoder tries to compress (encode) a graph into a latent variable representation suitable for generating (decoding) it back into the original graph as outlined in figure 1. If the number of latent variables is lower than the dimensionality of the input data, they force a compressed representation that prioritizes the most salient data properties. In this article, we
(1) discuss how to adapt the -VAE model to graphs in section 2,
(2) apply it to recover parameters for topology-generating procedures in section 3.1, and
(3) leverage it to quantify dependencies between graph topology and node attributes in section 3.2.

2 Model

We instantiate the idea of -VAEs Higgins et al. (2017) with graph-specific encoders and decoders. Our encoder model is a Graph Convolutional Network (GCN) Kipf and Welling (2017) and the decoder

is a deconvolutional neural network. Hence, in our setting the encoder is operating on the graph structure, whereas the decoder produces a graph by computing an adjacency matrix. We train this autoencoder in the

-VAE setting, in which the loss to minimize is In the loss term, the reconstruction loss is balanced with the KL regularization term using a parameter . A higher value of yields stricter alignment to the Gaussian prior , leading to an orthogonalization of the encoding in Chen et al. (2018, 2016); Kim and Mnih (2018). To further enforce disentangled representations of , we attach an additional parameter decoder to the latent space that learns a direct mapping between latent variables and generative parameters . If is implemented as a linear mapping, the latent space needs to align with the generative parameters , hence further favoring the result of the encoder to be disentangled. If the latent space is perfectly disentangled, there should exist a one-to-one, bijective mapping between latent variables and generative parameters .

For graphs of which we know the ground truth generative parameters , we use the metric Mutual Information Gap (MIG) Chen et al. (2018) to quantify the degree of correlation between and . MIG measures both, the extent to which latent variables share mutual information with generative parameters , and the mutual independence of the latent variables from each other. The metric ranges between 0 and 1, where 1 represents a perfectly disentangled scenario in which there exists a deterministic, invertible one-to-one mapping between and . MIG is computed by first identifying the two latent variables of highest mutual information (MI) with each generative parameter . The MIG score is then defined as the difference (gap) between the highest and second highest MI, averaged over the generative factors .

3 Evaluation

3.1 Modelling Graph Topology with Latent Variables

Figure 2: Disentangled latent representation of ER graphs  The latent space appears axis-aligned with and orthogonally representing and . Changing one latent variable or corresponds to a change in one generative parameter or respectively, while being relatively invariant to changes in other parameters. and are not utilized by the model.

First, we evaluate our approach on synthetically generated graphs, concretely, ER graphs Erdös and Rényi (1959). The ER generation procedure takes two parameters: the number of nodes and a uniform linking probability . Ideally, our model should be able to single out these independent generative parameters by utilizing only two latent variables that describe a one-to-one mapping. To test this hypothesis, we generate 10,000 ER graphs, varying between 1 and 24 and between 0 and 1. We use these to train our model with a latent space of size and .

To inspect the latent space of the trained model, we sample from in fixed-size steps and decode the sample through . Figure 2 shows graphs (on the left) and adjacency matrices (in the center) sampled from the latents and while keeping other latents fixed. The adjacency matrices allow reading off topological properties of the graphs such as degree distribution and assortativity since nodes are sorted according to the extended BOSAM Guo et al. (2006) algorithm. Instead of decoding samples, we may also encode graph instances with known ground truth generative parameters and observe the latents . We generate a new set of 1,000 ER graphs with varying and feed these graphs to the trained model. In figure 2 (on the right), each row displays samples from one latent variable and the columns represent generative parameters an . We find that a change in or results in a change in or respectively, while being invariant to changes in other variables. This is manifested in a MIG of 0.62, denoting moderate to strong disentanglement. and do not show correlations with either or , emphasizing their "non-utilization". This shows that the latent variables of our model correctly discover the dimensionality 2 of the underlying generative procedure of ER graphs.

We repeat the experiment on a uni-, bi- and tri-parametric random graph model and two real-world graphs presented in the appendix. The selected graphs are complete binary tree graphs, BA graphs Barabasi and Albert (1999), Small-World graphs Watts and Strogatz (1998) as well as the CORA McCallum (2017) and Wikipedia Hyperlink Rossi and Ahmed (2015) graph.

3.2 Measuring Graph Topology-Node Attribute Dependence

In addition to pure graph topology , we consider node-level attributes and measure the degree to which and are mutually dependent. For example in a co-authorship graph where nodes represent authors and undirected links represent joint papers between authors, each node may hold additional information about the author’s overall citation count. We denote this additional information as node attributes . Intuitively, more collaborations and therefore a higher node degree encourage a higher citation count, though there may be numerous other hidden correlations between graph topology and node attributes. Most existing topology-based approaches cannot make a general statement to what extent graph topology and node attributes are correlated without hand-picking particular topological properties such as the node degree Zaki (2000).

We claim that the dependence between topological structure and attributes is encoded in the latent variables. If and are generated by independent generative procedures, they may be described by two disentangled sets of latent variables Kim and Mnih (2018). Proposing a node attribute randomization approach, we work with two data sets, the original graphs and their attribute-randomized versions . Since random graph generators such as ER graphs Erdös and Rényi (1959) do not cover node attributes, we first have to generate synthetic node attributes. Independent from and , and hence from the topology , all nodes of an ER graph are uniformly at random assigned the same node attribute which is a value between 0 and 1. We train the modified -VAE on this graph data set . After training, we randomize the node attributes, ending up with the randomized graph data . We vary the randomization degree between 0 and 1, which denotes the fraction of randomized nodes. Finally, we present and to the trained model in order to observe how the randomization affects the latent variables .

Figure 3: Latent representation of ER graphs with uniform node attributes  Node attribute values are indicated by the shade of blue. Traversing and while keeping other latent variables fix reveals a change in the topology , as and vary. and are invariant to node attributes . Since is most volatile to , it presumably models .

In the case of - independence, randomizing node attributes causes a shift in only those latent variables modelling . To indirectly quantify the dependence between and , we measure the correlation between and . describes the absolute change of due to . If only one latent variable changes while others are invariant, and are generated from a fixed number of independent factors of variation Chen et al. (2018). Disentanglement between latent variables serves as a proxy for the dependence of generative parameters . Figure 3 (left and center) displays manifolds of samples from latent space. Traversing and while fixing other latent variables reveals a change in , as and change, but invariance to . is modelling , which is supported by figure 3 (right) showing absolute shifts in the latents depending on the fraction of randomized nodes .

Treating the randomization degree as a generative parameter, we calculate the mutual information (MI) between and the absolute change in every latent . then computes the gap between the first and second highest MI, normalized by the entropy . In the equation below, denotes the index of latent with highest MI regarding .

The latent variable reacting most strongly to is . corresponding to figure 3 is 0.335, indicating moderate disentanglement of and as are mostly invariant to . We repeat the experiment on the Microsoft Academic Graph (MAG) Sinha et al. (2015) and Amazon Co-Purchasing Graph Leskovec et al. (2007a), presented in the appendix. In particular for the Microsoft Academic Graph, the analysis reveals a strong impact of the collaboration patterns (graph topology) on the citation count (node attributes).

Conclusion

This work demonstrates the potential of latent variable disentanglement in graph deep learning for unsupervised discovery of generative parameters of random and real-world graphs. Experiments have largely confirmed our hypotheses, but also revealed shortcomings. Future work should advance node order-independent graph decoders and target interpretability by exploiting generative models that do not sacrifice reconstruction fidelity for disentanglement.

References

  • A. Barabasi and R. Albert (1999) Emergence of scaling in random networks. Science 286. Cited by: §3.1, Figure 6.
  • A. Barabási and M. Pósfai (2016) Network science. Cambridge University Press, Cambridge. Cited by: §1.
  • Y. Bengio, A. Courville, and P. Vincent (2013) Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8), pp. 1798–1828. Cited by: §1.
  • A. Bojchevski, O. Shchur, D. Zügner, and S. Günnemann (2018) NetGAN: generating graphs via random walks. In International Conference on Machine Learning (ICML), Cited by: §1.
  • M. M. Bronstein, J. Bruna Estrach, Y. LeCun, A. Szlam, and P. Vandergheynst (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §1.
  • I. Brugere, B. Gallagher, and T. Y. Berger-Wolf (2018) Network structure inference, a survey: motivations, methods, and applications. ACM Computing Survey 51 (2). Cited by: §1.
  • [7] S. Cao, W. Lu, and Q. Xu Deep neural networks for learning graph representations. In

    AAAI Conference on Artificial Intelligence (AAAI)

    ,
    Cited by: §1.
  • D. Chakrabarti and C. Faloutsos (2006) Graph mining: laws, generators, and algorithms. ACM Computing Survey 38. Cited by: §1.
  • T. Q. Chen, X. Li, R. B. Grosse, and D. K. Duvenaud (2018) Isolating sources of disentanglement in variational autoencoders. In Advances in Neural Information Processing Systems (NeurIPS), pp. 2610–2620. Cited by: §1, §2, §2, §3.2.
  • X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 2172–2180. Cited by: §1, §2.
  • S.N. Dorogovtsev and J.F.F. Mendes (2002) Evolution of networks. Advances in Physics 51 (4). Cited by: §1.
  • P. Erdös and A. Rényi (1959) On random graphs. Publicationes Mathematicae Debrecen 6. Cited by: §1, §3.1, §3.2.
  • A. Grover and J. Leskovec (2016) Node2Vec: scalable feature learning for networks. In Knowledge Discovery and Data Mining (KDD), Cited by: §1, Figure 4.
  • Y. Guo, C. Chen, and S. Zhou (2006) BOSAM: a topology visualisation tool for large-scale complex networks. arXiv preprint cs/0602034. Cited by: §3.1.
  • I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner (2017) Beta-vae: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR), Vol. 1, pp. 370–378. Cited by: §1, §2.
  • D. D. Johnson (2017) Learning graphical state transitions. In International Conference on Learning Representations (ICLR), Vol. 34, pp. 370–378. Cited by: §1.
  • H. Kim and A. Mnih (2018) Disentangling by factorising. Proceedings of the 35th International Conference on Machine Learning (ICML) 80, pp. 2649–2658. Cited by: §1, §2, §3.2.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (ICLR) 34, pp. 34–42. Cited by: §2.
  • J. Leskovec, L. A. Adamic, and B. A. Huberman (2007a) The dynamics of viral marketing. ACM Transactions on the Web 5, pp. 340–350. Cited by: §3.2.
  • J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani (2010) Kronecker graphs: an approach to modeling networks. Journal of Machine Learning Research 11. Cited by: §1.
  • J. Leskovec, J. Kleinberg, and C. Faloutsos (2007b) Graph evolution: densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 8, pp. 56–68. Cited by: §1.
  • Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia (2018) Learning deep generative models of graphs. In International Conference on Machine Learning (ICML), Cited by: §1.
  • Y. Li, R. Zemel, M. Brockschmidt, and D. Tarlow (2016) Gated graph sequence neural networks. In International Conference on Learning Representations (ICLR), Cited by: §1.
  • [24] Q. Liu, M. Allamanis, M. Brockschmidt, and A. Gaunt Constrained graph variational autoencoders for molecule design. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §1.
  • [25] J. Ma, P. Cui, K. Kuang, X. Wang, and W. Zhu Disentangled graph convolutional networks. In International Conference on Machine Learning (ICML), Cited by: §1.
  • A. McCallum (2017) Cora dataset. Texas Data Repository Dataverse. Cited by: §3.1, Figure 4.
  • F. Menczer (2004) Evolution of document networks. Proceedings of the National Academy of Sciences 101, pp. 5261–5265. Cited by: §1.
  • S. Navlakha (2017) Learning the structural vocabulary of a network. Neural Computing 29 (2), pp. 287–312. External Links: ISSN 0899-7667 Cited by: §1.
  • E. Noutahi, D. Beani, J. Horwood, and P. Tossou (2019) Towards interpretable sparse graph representation learning with laplacian pooling. CoRR abs/1905.11577. Cited by: §1.
  • B. Perozzi, V. Kulkarni, and S. Skiena (2016) Walklets: multiscale graph embeddings for interpretable network classification. CoRR abs/1605.02115. Cited by: §1.
  • R. A. Rossi and N. K. Ahmed (2015) The network data repository with interactive graph analytics and visualization. In AAAI, External Links: Link Cited by: §3.1, Figure 4.
  • M. Simonovsky and N. Komodakis (2018) GraphVAE: towards generation of small graphs using variational autoencoders. In Artificial Neural Networks and Machine Learning (ICANN), pp. 412–422. Cited by: §1.
  • A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B. (. Hsu, and K. Wang (2015) An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th International Conference on World Wide Web (WWW), New York, NY, USA, pp. 243–246. Cited by: §3.2.
  • J. Stühmer, R. E. Turner, and S. Nowozin (2019)

    Independent subspace analysis for unsupervised learning of disentangled representations

    .
    arXiv abs/1909.05063. Cited by: §1.
  • A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani (2003) A global protein function prediction in protein-protein interaction networks. Nature Biotech, pp. 697–700. Cited by: §1.
  • [36] D. Wang, P. Cui, and W. Zhu Structural deep network embedding. In Knowledge Discovery and Data Mining (KDD), Cited by: §1.
  • D. J. Watts and S. H. Strogatz (1998) Collective dynamics of ’small-world’ networks. Nature 393 (6684), pp. 440–442. Cited by: §3.1, Figure 6.
  • J. You, R. Ying, X. Ren, W. L. Hamilton, and J. Leskovec (2018) GraphRNN: A deep generative model for graphs. CoRR abs/1802.08773. Cited by: §1.
  • M. J. Zaki (2000) Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12 (3), pp. 372–390. Cited by: §3.2.
  • S. Zhou and R. J. Mondragón (2004) Accurately modeling the internet topology. Physical Review E 128, pp. 578–586. Cited by: §1.

Code

Appendix

Figure 4: Latent representations of real-world graphs  Latent space of -VAE model trained on 10,000 sub-graphs from CORA McCallum (2017) and Wikipedia Rossi and Ahmed (2015) sampled using Biased Second-Order Random Walks Grover and Leskovec (2016). Plot (a) and (b) show manifolds of decoded instances, presented as graphs and adjacency matrices respectively. Plot (c) compares the normalized degree distribution of with the distribution of the entire, original graph. Similarly, plot (d) shows the difference in clustering coefficient, degree assortativity and average degree.
Figure 5: Latent representations of real-world graphs with node attributes  Manifold of graph instances obtained from traversing latent variables and decoding samples according to . In the Microsoft Academic Graph, topology and node attributes can hardly be disentangled (). A reason lies in a strong correlation (0.4662) between the number of collaborations () and citations ().
Figure 6: Disentangled latent representation of uni-, bi- and tri-parametric random graph generator models  Latent representation of uni-parametric complete binary tree graph, bi-parametric Barabasi-Albert (BA) graphs Barabasi and Albert (1999) and tri-parametric Small-World graphs (SW) Watts and Strogatz (1998). For visualizing the tri-parametric SW graphs, we pick a fixed value for throughout all samples from the latent space. Since models the number of nodes

, all generated graphs in the manifolds are of fixed size. In compliance with intuition, the higher the degree of freedom in terms of generative parameters, the more difficult their successful disentanglement, manifested in a lower

MIG value for the tri-parametric SW graph

. If a uni-parametric model is described by a single latent variable,

MIG is not informative.