Motivation and Related Work
Conventional network analysis aims at finding interpretable models that explain interaction dynamics by examining graphs as discrete objects Barabási and Pósfai (2016). Random graph generator models Chakrabarti and Faloutsos (2006) like Erdős–Rényi random graphs (ER graphs) Erdös and Rényi (1959) are usually too generic to accurately represent the versatile linking patterns of real-world graphs Chakrabarti and Faloutsos (2006); Leskovec et al. (2010); Dorogovtsev and Mendes (2002). Devising models that reproduce characteristic topologies prevalent in social Leskovec et al. (2007b), biological Vazquez et al. (2003), internet Zhou and Mondragón (2004) or document Menczer (2004) graphs typically requires a thorough understanding of the domain and time-consuming graph simulations, thereby imposing strong assumptions and modelling bias. Recently, deep learning on non-euclidean data such as graphs has received substantial attention Bronstein et al. (2017). As these techniques require little or no explicit modelling and capture complex graph structure Brugere et al. (2018); Wang et al. , we propose to use them as a tool to obtain interpretable generative parameters of graphs. As a limiting factor, most existing models generate graphs sequentially based on concatenations of node embeddings. These are not only non-interpretable but also impose an artificial node ordering instead of considering a global representation of the entire graph Johnson (2017); You et al. (2018); Li et al. (2018); Liu et al. ; Li et al. (2016); Simonovsky and Komodakis (2018). DisenGCN Ma et al. focuses on interpretability, but is limited to node-level linking mechanisms. The latent space of NetGAN Bojchevski et al. (2018) reveals topological properties instead of generative parameters. Some recent works on interpretable graph embeddings Wang et al. ; Grover and Leskovec (2016); Cao et al. ; Perozzi et al. (2016); Noutahi et al. (2019) provide visualizations for inspection, but no parameters suitable for a generative model.
In other domains, interest in model interpretability has caused a focus on the latent space of neural models Navlakha (2017). Intuitively, the aim is to shape the latent space such that the euclidean distance between the latent representations of two data points corresponds to a “distance” between the actual data points Bengio et al. (2013)
. Latent variables describe probability distributions over the latent space. The goal of latent variable disentanglement can be understood as wanting to use each latent variable to encode one and only one data property in a one-to-one mappingChen et al. (2018), making the latent space more interpretable. Varying one latent variable should then correspond to a change in one observable factor of variation in the data, while other factors remain invariant Bengio et al. (2013). Most work in this field has been focused on visual and sequential data Chen et al. (2018, 2016); Kim and Mnih (2018); Stühmer et al. (2019).
We assume that graphs are generated by superposition of interpretable, generative procedures parameterized by generative parameters such as and in ER graphs. We hypothesize that these generative parameters can be encoded by a minimal set of disentangled latent variables
in an unsupervised machine learning model. To this end, we apply the idea of-Variational Autoencoders (-VAE) Higgins et al. (2017) in the context of graphs. Intuitively, our autoencoder tries to compress (encode) a graph into a latent variable representation suitable for generating (decoding) it back into the original graph as outlined in figure 1. If the number of latent variables is lower than the dimensionality of the input data, they force a compressed representation that prioritizes the most salient data properties. In this article, we
(1) discuss how to adapt the -VAE model to graphs in section 2,
(2) apply it to recover parameters for topology-generating procedures in section 3.1, and
(3) leverage it to quantify dependencies between graph topology and node attributes in section 3.2.
is a deconvolutional neural network. Hence, in our setting the encoder is operating on the graph structure, whereas the decoder produces a graph by computing an adjacency matrix. We train this autoencoder in the-VAE setting, in which the loss to minimize is In the loss term, the reconstruction loss is balanced with the KL regularization term using a parameter . A higher value of yields stricter alignment to the Gaussian prior , leading to an orthogonalization of the encoding in Chen et al. (2018, 2016); Kim and Mnih (2018). To further enforce disentangled representations of , we attach an additional parameter decoder to the latent space that learns a direct mapping between latent variables and generative parameters . If is implemented as a linear mapping, the latent space needs to align with the generative parameters , hence further favoring the result of the encoder to be disentangled. If the latent space is perfectly disentangled, there should exist a one-to-one, bijective mapping between latent variables and generative parameters .
For graphs of which we know the ground truth generative parameters , we use the metric Mutual Information Gap (MIG) Chen et al. (2018) to quantify the degree of correlation between and . MIG measures both, the extent to which latent variables share mutual information with generative parameters , and the mutual independence of the latent variables from each other. The metric ranges between 0 and 1, where 1 represents a perfectly disentangled scenario in which there exists a deterministic, invertible one-to-one mapping between and . MIG is computed by first identifying the two latent variables of highest mutual information (MI) with each generative parameter . The MIG score is then defined as the difference (gap) between the highest and second highest MI, averaged over the generative factors .
3.1 Modelling Graph Topology with Latent Variables
First, we evaluate our approach on synthetically generated graphs, concretely, ER graphs Erdös and Rényi (1959). The ER generation procedure takes two parameters: the number of nodes and a uniform linking probability . Ideally, our model should be able to single out these independent generative parameters by utilizing only two latent variables that describe a one-to-one mapping. To test this hypothesis, we generate 10,000 ER graphs, varying between 1 and 24 and between 0 and 1. We use these to train our model with a latent space of size and .
To inspect the latent space of the trained model, we sample from in fixed-size steps and decode the sample through . Figure 2 shows graphs (on the left) and adjacency matrices (in the center) sampled from the latents and while keeping other latents fixed. The adjacency matrices allow reading off topological properties of the graphs such as degree distribution and assortativity since nodes are sorted according to the extended BOSAM Guo et al. (2006) algorithm. Instead of decoding samples, we may also encode graph instances with known ground truth generative parameters and observe the latents . We generate a new set of 1,000 ER graphs with varying and feed these graphs to the trained model. In figure 2 (on the right), each row displays samples from one latent variable and the columns represent generative parameters an . We find that a change in or results in a change in or respectively, while being invariant to changes in other variables. This is manifested in a MIG of 0.62, denoting moderate to strong disentanglement. and do not show correlations with either or , emphasizing their "non-utilization". This shows that the latent variables of our model correctly discover the dimensionality 2 of the underlying generative procedure of ER graphs.
We repeat the experiment on a uni-, bi- and tri-parametric random graph model and two real-world graphs presented in the appendix. The selected graphs are complete binary tree graphs, BA graphs Barabasi and Albert (1999), Small-World graphs Watts and Strogatz (1998) as well as the CORA McCallum (2017) and Wikipedia Hyperlink Rossi and Ahmed (2015) graph.
3.2 Measuring Graph Topology-Node Attribute Dependence
In addition to pure graph topology , we consider node-level attributes and measure the degree to which and are mutually dependent. For example in a co-authorship graph where nodes represent authors and undirected links represent joint papers between authors, each node may hold additional information about the author’s overall citation count. We denote this additional information as node attributes . Intuitively, more collaborations and therefore a higher node degree encourage a higher citation count, though there may be numerous other hidden correlations between graph topology and node attributes. Most existing topology-based approaches cannot make a general statement to what extent graph topology and node attributes are correlated without hand-picking particular topological properties such as the node degree Zaki (2000).
We claim that the dependence between topological structure and attributes is encoded in the latent variables. If and are generated by independent generative procedures, they may be described by two disentangled sets of latent variables Kim and Mnih (2018). Proposing a node attribute randomization approach, we work with two data sets, the original graphs and their attribute-randomized versions . Since random graph generators such as ER graphs Erdös and Rényi (1959) do not cover node attributes, we first have to generate synthetic node attributes. Independent from and , and hence from the topology , all nodes of an ER graph are uniformly at random assigned the same node attribute which is a value between 0 and 1. We train the modified -VAE on this graph data set . After training, we randomize the node attributes, ending up with the randomized graph data . We vary the randomization degree between 0 and 1, which denotes the fraction of randomized nodes. Finally, we present and to the trained model in order to observe how the randomization affects the latent variables .
In the case of - independence, randomizing node attributes causes a shift in only those latent variables modelling . To indirectly quantify the dependence between and , we measure the correlation between and . describes the absolute change of due to . If only one latent variable changes while others are invariant, and are generated from a fixed number of independent factors of variation Chen et al. (2018). Disentanglement between latent variables serves as a proxy for the dependence of generative parameters . Figure 3 (left and center) displays manifolds of samples from latent space. Traversing and while fixing other latent variables reveals a change in , as and change, but invariance to . is modelling , which is supported by figure 3 (right) showing absolute shifts in the latents depending on the fraction of randomized nodes .
Treating the randomization degree as a generative parameter, we calculate the mutual information (MI) between and the absolute change in every latent . then computes the gap between the first and second highest MI, normalized by the entropy . In the equation below, denotes the index of latent with highest MI regarding .
The latent variable reacting most strongly to is . corresponding to figure 3 is 0.335, indicating moderate disentanglement of and as are mostly invariant to . We repeat the experiment on the Microsoft Academic Graph (MAG) Sinha et al. (2015) and Amazon Co-Purchasing Graph Leskovec et al. (2007a), presented in the appendix. In particular for the Microsoft Academic Graph, the analysis reveals a strong impact of the collaboration patterns (graph topology) on the citation count (node attributes).
This work demonstrates the potential of latent variable disentanglement in graph deep learning for unsupervised discovery of generative parameters of random and real-world graphs. Experiments have largely confirmed our hypotheses, but also revealed shortcomings. Future work should advance node order-independent graph decoders and target interpretability by exploiting generative models that do not sacrifice reconstruction fidelity for disentanglement.
- Emergence of scaling in random networks. Science 286. Cited by: §3.1, Figure 6.
- Network science. Cambridge University Press, Cambridge. Cited by: §1.
- Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8), pp. 1798–1828. Cited by: §1.
- NetGAN: generating graphs via random walks. In International Conference on Machine Learning (ICML), Cited by: §1.
- Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §1.
- Network structure inference, a survey: motivations, methods, and applications. ACM Computing Survey 51 (2). Cited by: §1.
Deep neural networks for learning graph representations.
AAAI Conference on Artificial Intelligence (AAAI), Cited by: §1.
- Graph mining: laws, generators, and algorithms. ACM Computing Survey 38. Cited by: §1.
- Isolating sources of disentanglement in variational autoencoders. In Advances in Neural Information Processing Systems (NeurIPS), pp. 2610–2620. Cited by: §1, §2, §2, §3.2.
- InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 2172–2180. Cited by: §1, §2.
- Evolution of networks. Advances in Physics 51 (4). Cited by: §1.
- On random graphs. Publicationes Mathematicae Debrecen 6. Cited by: §1, §3.1, §3.2.
- Node2Vec: scalable feature learning for networks. In Knowledge Discovery and Data Mining (KDD), Cited by: §1, Figure 4.
- BOSAM: a topology visualisation tool for large-scale complex networks. arXiv preprint cs/0602034. Cited by: §3.1.
- Beta-vae: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR), Vol. 1, pp. 370–378. Cited by: §1, §2.
- Learning graphical state transitions. In International Conference on Learning Representations (ICLR), Vol. 34, pp. 370–378. Cited by: §1.
- Disentangling by factorising. Proceedings of the 35th International Conference on Machine Learning (ICML) 80, pp. 2649–2658. Cited by: §1, §2, §3.2.
- Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (ICLR) 34, pp. 34–42. Cited by: §2.
- The dynamics of viral marketing. ACM Transactions on the Web 5, pp. 340–350. Cited by: §3.2.
- Kronecker graphs: an approach to modeling networks. Journal of Machine Learning Research 11. Cited by: §1.
- Graph evolution: densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 8, pp. 56–68. Cited by: §1.
- Learning deep generative models of graphs. In International Conference on Machine Learning (ICML), Cited by: §1.
- Gated graph sequence neural networks. In International Conference on Learning Representations (ICLR), Cited by: §1.
-  Constrained graph variational autoencoders for molecule design. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §1.
-  Disentangled graph convolutional networks. In International Conference on Machine Learning (ICML), Cited by: §1.
- Cora dataset. Texas Data Repository Dataverse. Cited by: §3.1, Figure 4.
- Evolution of document networks. Proceedings of the National Academy of Sciences 101, pp. 5261–5265. Cited by: §1.
- Learning the structural vocabulary of a network. Neural Computing 29 (2), pp. 287–312. External Links: Cited by: §1.
- Towards interpretable sparse graph representation learning with laplacian pooling. CoRR abs/1905.11577. Cited by: §1.
- Walklets: multiscale graph embeddings for interpretable network classification. CoRR abs/1605.02115. Cited by: §1.
- The network data repository with interactive graph analytics and visualization. In AAAI, External Links: Cited by: §3.1, Figure 4.
- GraphVAE: towards generation of small graphs using variational autoencoders. In Artificial Neural Networks and Machine Learning (ICANN), pp. 412–422. Cited by: §1.
- An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th International Conference on World Wide Web (WWW), New York, NY, USA, pp. 243–246. Cited by: §3.2.
Independent subspace analysis for unsupervised learning of disentangled representations. arXiv abs/1909.05063. Cited by: §1.
- A global protein function prediction in protein-protein interaction networks. Nature Biotech, pp. 697–700. Cited by: §1.
-  Structural deep network embedding. In Knowledge Discovery and Data Mining (KDD), Cited by: §1.
- Collective dynamics of ’small-world’ networks. Nature 393 (6684), pp. 440–442. Cited by: §3.1, Figure 6.
- GraphRNN: A deep generative model for graphs. CoRR abs/1802.08773. Cited by: §1.
- Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12 (3), pp. 372–390. Cited by: §3.2.
- Accurately modeling the internet topology. Physical Review E 128, pp. 578–586. Cited by: §1.
To allow experimenting with the code, we provide an interactive notebook at https://colab.research.google.com/drive/1M--YX4dOSt3imDPdecPbjVX-T6Ae0_OG.