GINA: Neural Relational Inference From Independent Snapshots

by   Gerrit Großmann, et al.
Universität Saarland

Dynamical systems in which local interactions among agents give rise to complex emerging phenomena are ubiquitous in nature and society. This work explores the problem of inferring the unknown interaction structure (represented as a graph) of such a system from measurements of its constituent agents or individual components (represented as nodes). We consider a setting where the underlying dynamical model is unknown and where different measurements (i.e., snapshots) may be independent (e.g., may stem from different experiments). We propose GINA (Graph Inference Network Architecture), a graph neural network (GNN) to simultaneously learn the latent interaction graph and, conditioned on the interaction graph, the prediction of a node's observable state based on adjacent vertices. GINA is based on the hypothesis that the ground truth interaction graph – among all other potential graphs – allows to predict the state of a node, given the states of its neighbors, with the highest accuracy. We test this hypothesis and demonstrate GINA's effectiveness on a wide range of interaction graphs and dynamical processes.


page 1

page 2

page 3

page 4


Neural Relational Inference for Interacting Systems

Interacting systems are prevalent in nature, from dynamical systems in p...

Learning Heterogeneous Interaction Strengths by Trajectory Prediction with Graph Neural Network

Dynamical systems with interacting agents are universal in nature, commo...

Factorised Neural Relational Inference for Multi-Interaction Systems

Many complex natural and cultural phenomena are well modelled by systems...

The Role of Isomorphism Classes in Multi-Relational Datasets

Multi-interaction systems abound in nature, from colloidal suspensions t...

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

Predicting interactions between structured entities lies at the core of ...

Expansive Automata Networks

An Automata Network is a map f:Q^n→ Q^n where Q is a finite alphabet. It...

Roto-translated Local Coordinate Frames For Interacting Dynamical Systems

Modelling interactions is critical in learning complex dynamical systems...

1 Introduction

Complex interacting systems are ubiquitous in nature and society. However, their analysis remains challenging. Traditionally, the analysis of complex systems is based on models of individual components. This reductionist perspective reaches its limitations when the interactions of the individual components—not the components themselves—become the dominant force behind a system’s dynamical evolution. Understanding a system’s functional organization given such data is relevant for the analysis [10, 32, 1, 9], design [42, 17, 28], control [16, 14], and prediction [23, 41] of many complex phenomena that are constrained by a graph topology or contact network.

Here, we focus on the internal interaction structure (i.e., graph or network) of a complex system. We propose a machine learning approach to infer this structure based on observational data of the individual components or constituent agents (i.e., nodes). We refer to these observations as

snapshots and assume that the observable states of all components are measured simultaneously. However, we make no prior assumption about the relationship between snapshots. Specifically, measurements may be taken from different experiments with varying initial conditions and at arbitrary time points.

Most recent work focuses on time series data, where observations are time-correlated and the interaction graph is inferred from the joint time evolution of the node-states [41, 23]. Naturally, time series data typically contains more information on the system’s interaction than snapshot data. However, in many cases, such data is not available. For instance, in some cases, one has to destroy a system to access their components (e.g., slice a brain [35], observe a quantum system [26], or terminate a cell [5]). Sometimes, the relevant time scale of the system is too small (e.g., in particle physics) or too large (e.g., in evolutionary dynamics) to be observed. Often, there is a trade-off between spatial and temporal resolution of a measurement [36]. In addition, measurements may be temporally decoupled due to large observation intervals and thus become unsuitable for methods that exploit correlations in time. Yet, machine learning techniques for graph inference from independent data remain underexplored in the literature.


Figure 1: Schematic illustration the problem setting. We are interested in inferring the latent interaction graph from observational data of the process.

In contrast to many state-of-the-art approaches, our approach is model-free without any assumptions on the dynamical laws. Conceptually, our analysis is founded on identifying patterns in snapshots We exploit that identified patterns are likely the result of probabilistic local interactions and, thus, carry information about the underlying connectivity.

We propose an approach based on ideas recently popularized for time series based network inference [39, 23]. It provides an elegant way of formalizing the graph inference problem with minimal parametric assumptions on the underlying dynamical model. The core assumption is that the interaction graph best describing the observed data is the ground truth.

In the time series setting, this would be the graph that best enables time series forecasting. Instead, we aim at finding the graph that best enables us to predict a node’s state based on its direct neighborhood within a snapshot and not at future times. To this end, we use a prediction model to learn a node’s observable state (in the sequel: simply state or node-state) given the joint state of all adjacent nodes. Then we maximize the prediction accuracy by jointly optimizing the interaction graph and the prediction model. Loosely speaking, we assume that the information shared between a node and its complete neighborhood is higher in the ground truth graph than in other potential interaction typologies.

However, in a trivial implementation, the network that enables the best node-state prediction is the complete graph because it provides all information available in a given snapshot. This necessitates a form of regularization in which edges that are present, but not necessary, reduce prediction quality. We do this using a neighborhood aggregation scheme that acts as a bottleneck of information flow. We count the number of neighbors of a node

in each node-state and use these counts to predict state probabilities of

. Thus, the prediction model represents the conditional probability of a node-state given an aggregation of the complete neighborhood. Hence, irrelevant neighbors reduce the information content of the counting abstraction and consequently the prediction quality. This neighborhood aggregating provides a powerful description for many relevant dynamical models of interacting systems [13, 8].

The node-states can directly relate to the measurements or be the result of an embedding or a discretization. For the prediction, we use a simple node-specific multi-layer perception (MLP). This task can be framed as a simple -layer graph neural network (GNN) architecture.

For an efficient solution to the graph inference problem, we propose GINA (Graph Inference Network Architecture). GINA tackles this problem by simultaneously optimizing the interaction graph and the dynamics. The representation employs a computationally simple neighborhood aggregation and an efficient weight-sharing mechanism between nodes. In combination with a differentiable graph representation, this enables the application of our methods to systems with hundreds of nodes.

In summary, we formalize and test the hypothesis that network reconstruction can be formulated as a prediction task. This contribution includes: (i) we propose a suitable neighborhood aggregation mechanism, (ii) we propose the neural architecture GINA to efficiently solve the prediction and reconstruction task, and (iii) we evaluate GINA on synthetically generated snapshots using various combinations of graphs and diffusion models.

2 Related work

Literature abounds with methods to infer the (latent) functional organization of complex systems that is often expressed using (potentially weighted, directed, temporal, multi-) graphs.

Most relevant for this work are previous approaches that use deep learning on time series data to learn the interaction dynamics and the interaction structure. Zhang et al. propose a two-component GNN architecture where a graph generator network propose interaction graphs and a dynamics learner learns to predict the dynamical evolution using the interaction graph

[41, 40]. Both components are trained alternately. Similarly, Kipf et al. learn the dynamics using an encoder-decoder architecture that is constrained by an interaction graph which is optimized simultaneously [23]. Huang et al. use a compression-based method for this purpose [21]

. Another state-of-the-art approach for this problem, based on regression analysis instead of deep learning, is the ARNI framework by Casadiego et al. 

[4]. This method, however, requires time-series data and hinges on a good choice of basis functions.

Other methods to infer interaction structures aim at specific dynamical models and applications. Examples include epidemic contagion [29, 7, 34], gene networks [24, 30], functional brain network organization [6], and protein-protein interactions [20]. In contrast, our approach assumes no prior knowledge about the laws that govern the system’s evolution.

Statistical methods provide an equally viable and often very robust alternative. These can be based on partial correlation, mutual information, or graphical lasso [38, 11]. Here, we not only rely on pair-wise correlations among components but also the joint impact of all neighboring components, which is necessary in the presence of non-linear dynamical laws governing the system. Moreover, we directly infer binary (i.e., unweighted) graphs in order to not rely on (often unprincipled) threshold mechanisms.

3 Foundations and problem formulation


Figure 2: Schematic architecture using -node graph with . Nodes are color-coded, node-states are indicated by the shape (filled: I, blank: S). First, we compute for each node (stored as M), then we feed each into a predictor that predicts the original state.

The goal is to find the latent interaction graph of a complex system with agents/nodes. A graph is represented as an adjacency matrix of dimension (with node set ), where indicates the presence () or absence () of an edge between node and . We assume that is symmetric (the graph is undirected) and the diagonal entries are all zero (the graph has no self-loops). We use to denote the ground truth matrix.

Each snapshot assigns a node-state to each node. The finite set of possible node-states is denoted

. For convenience, we assume that the node-states are represented using one-hot encoding. For instance, in an epidemic model nodes might be susceptible or infected. Since there are two node-states, we use

. Each snapshot can then conveniently be represented as a matrix with rows, where each row describes the corresponding one-hot encoded node-state (cf. Fig. 2 left). We use to denote the set of independent snapshots. We make no specific assumption about the underlying distribution or process behind the snapshots or their relationship to another.

For a node (and fixed snapshot), we use to denote the (element-wise) sum of all neighboring node-states, referred to as neighborhood counting vector. For example, consider again the case : refers to a node with susceptible () and infected () neighbors. Note that the sum over is the degree (i.e., number of neighbors) of that node (here,

). The set of all possible neighborhood counting vectors is denoted by

. We can compute neighborhood counting vectors of all nodes using , where the -th row of equals (cf. Fig. 2 center).

In the next step, we feed each counting vector in a machine learning model that predicts the original state of in that snapshot. Specifically, we learn a function , where

denotes the set of all probability distributions over

(in the sequel, we make the mapping to a probability distribution explicit by adding a Softmax function). To evaluate

, we use some loss function to quantify how well the distribution predicts the true state and minimize this

prediction loss. We assume that is fully parameterized by a node-dependent weight matrix . All weights in a network are given by the list . We also define as a node-wise application of , that is,

The hypothesis is that the ground truth adjacency matrix provides the best foundation for , ultimately leading to the smallest prediction loss. Under this hypothesis, we can use the loss as a surrogate for the accuracy of candidate (compared to the ground truth graph). The number of possible adjacency matrices of a system with nodes (assuming no self-loops and symmetries) is . Thus, it is hopeless to simply try all possible adjacency matrices. Hence, we need to strengthen this hypothesis and assume that smaller distances between and (we call this graph loss) lead to smaller prediction losses (in a reasonable sense). This way, the optimization becomes feasible and we can follow the surrogate loss in order to arrive at .

Graph neural network

Next, we formulate the graph inference problem using a graph neural network

that loosely resembles an autoencoder architecture: In each snapshot

X, we predict the node-state of each node using only the neighborhood of that node. Then, we compare the prediction with the actual (known) node-state.

For a given adjacency matrix (graph) and list of weight matrices , we define the GNN , applied to a snapshot as:

where is applied row-wise. Thus, results in a matrix where each row corresponds to a node and models a distribution over node-states. Similar to the auto-encoder paradigm, input and output are of the same form and the network learns to minimize the difference (note that the one-hot encoding can also be viewed as a valid probability distribution over node-states). The absence of self-loops in is critical as it means that the actual node-state of a node is not part of their own neighborhood aggregation. As we want to predict a node’s state, the state itself cannot be part of the input.

We will refer to the matrix multiplication as graph layer and to the application of as prediction layer. We only perform a single application of the graph layer on purpose, which means that only information from the immediate neighborhood can be used to predict a node-state. While using -hop neighborhoods would increase the network’s predictive power it would be detrimental to graph reconstruction.

Most modern GNN architectures are written as in the message-passing scheme, where each layer performs an aggregate and a combine step. The aggregate step computes a neighborhood embedding based on a permutation-invariant function of all neighboring nodes. The combine step combines this embedding with the actual node-state. In our architecture, aggregation is the element-wise sum. The combination, however, needs to purposely ignore the node-state (in order for the prediction task to make sense) and applies .

Prediction loss

We assume a loss function that is applied independently for each snapshot:

The prediction loss compares the input (actual node-states) and output (predicted node-states) of the GNN . We define the loss on a set of independent snapshots as the sum over all constituent snapshots:

In our experiments, we use row-wise MSE-loss.

Note that, in the above sum, all snapshots are treated equally independent of their corresponding initial conditions or time points at which they were made (which we do not know anyway).

Graph inference problem

We define the graph inference problem as follows: For given set of snapshots (corresponding to nodes), find an adjacency matrix and list of weight matrices minimizing the prediction loss:

Note that, in general we cannot guarantee that is equal to the ground truth matrix . Regarding the computational complexity, it is known that network reconstruction for epidemic models based on time series data is -hard when formulated as a decision problem [33]. We believe that this carries over to our setting but leave a proof for future work.

4 Our method: Gina


Figure 3: Illustration of GINA. Snapshots are processed independently. Each input/snapshot associates each node with a state and each output with a distribution over states. The training process optimizes the distribution. The output is computed based on a multiplication with the current adjacency matrix candidate (stored as C

) and the application of a node-wise MLP. Ultimately, we are interested in a binarized version of the adjacency matrix. Color/filling indicates the state, shape identifies nodes.

As explained in the previous section, it is not possible to solve the graph inference problem by iterating over all possible graphs/weights. Hence, we propose GINA (Graph Inference Network Architecture). GINA efficiently approximates the graph inference problem by jointly optimizes the graph and the predictor layer weights .

Therefore, we adopt two tricks: Firstly, we impose a relaxation on the graph adjacency matrix representing its entries as real-valued numbers. Secondly, we use shared weights in the weight matrices belonging to different nodes. Specifically, each node gets its custom MLP, but weights of all layers, except the last one, are shared among nodes. This allows us to simultaneously optimize the graph and the weights using back-propagation. Apart from that, we still follow the architecture from the previous section. That is, a network layer maps a snapshot to neighborhood counting vectors, and each neighborhood counting vector is pushed through a node-based MLP.

Graph layer

Internally, we store the interaction graph as an upper triangular matrix C. In each step, we (i) compute to enforce symmetry, (ii) compute , and (iii) set all diagonal entries of to zero. Here, is a differential function that is applied element-wise and maps real-valued entries to the interval . It ensures that approximately behaves like a binary adjacency matrix while remaining differentiable (the hat-notation indicates relaxation). Specifically, we adopt a Sigmoid-type function that is parametrized by a sharpness parameter :

where we choose and increase the sharpness of the step using over the course of the training. Finally, the graph layer matrix is multiplied with the snapshot (i.e., ) and yields a relaxed version of the neighborhood counting vectors.

Prediction layer

In , each row corresponds to one node. Thus, we apply the MLPs independently to each row. We use to denote the row corresponding to node (i.e., the neighborhood counting relaxation of ). Let denote a fully-connected (i.e., linear) layer with input (resp. output) dimension (resp. ). We use a ReLu

and Softmax activation function. The whole prediction MLP contains four sub-layers and is given as:

Only the last sub-layer contains node-specific weights. This layer enables a node-specific shift of the probability computed in the previous layer. Note that we use a comparably small dimension (i.e., 10) for internal embeddings, which has shown to be sufficient in our experiments.


We empirically find that over-fitting is not a problem and therefore do not use a test set. A natural approach would be to split the snapshots into a training and test set and optimize and on the training set until the loss reaches a minimum on the test set. Specifically, the loss on the test set provides the best surrogate for the distance to the ground truth graph. Another important aspect during training is the usage of mini-batches. For ease of notation, we ignored batches so far. In practice, mini-batches are crucial for fast and robust training. A mini-batch of size , can be created by concatenating snapshots (in the graph layer) and re-ordering the rows accordingly (in the prediction layer).


There are some relevant limitations to GINA. Firstly, we can provide no guarantees that the ground truth graph is actually the solution to the graph inference problem. In particular, simple patterns in the time domain (that enable trivial graph inference using time series data) might correspond to highly non-linear patterns inside a single snapshot. Moreover, GINA is only applicable if statistical associations among adjacent nodes manifest themselves in a way that renders the counting abstraction meaningful. Statistical methods are more robust in the way they can handle different types of pair-wise interactions but less powerful regarding non-linear combined effects of the complete neighborhood. Another relevant design decision is to use one-hot encoding which renders the forward pass extremely fast but will reach limitations when the node-state domain becomes very complex. Together with relational homogeneity, we also assume that all agents behave reasonably similar to another which enables weight sharing and therefore greatly increases the efficiency of the training and reduces the number of required samples.

5 Experiments

[height=2.8cm]exp3_run1_sis_G_grid10x10_0000_snapshot.pdf x [height=2.8cm]exp3_run1_inv_voter_G_grid10x10_0000_snapshot.pdf x [height=2.8cm]exp3_run1_game_of_life_G_grid10x10_0002_snapshot.pdf [height=2.8cm]exp3_run1_rock_paper_scissors_G_grid10x10_0000_snapshot.pdf

Figure 4: Exp. 2: Examples of typical equilibrium snapshots on a grid graph. Different dynamics give rise to different types of cluster formations.

We conduct three experiments using synthetically generated snapshots. In Experiment 1, we analyze the underlying hypothesis that the ground truth graph enables the best node-state prediction. In Experiment 2, we study the importance of sample size for the reconstruction accuracy, and in Experiment 3, we compare GINA to statistical baselines. Our prototype of GINA

is implemented using PyTorch


and is executed on a standard desktop computer with an NVIDIA GeForce RTX 3080, 32 GB of RAM, and an Intel i9-10850K CPU. Open source code (GNU GPLv3) is available at GitHub

To measure the quality of an inferred graph, we compute the distance to the ground truth graph. We define this graph loss as the (Manhattan) distance of the upper triangular parts of the two adjacency matrices (i.e., the number of edges to add/remove). We always use a binarized version of for comparison with . All results are based on a single run of GINA, performing multiple runs and using the result with the lower prediction loss, might further improve GINA’s performance.

Dynamical models

We study six models. A precise description of dynamics and parameters are provided in Appendix A.2. We focus on stochastic processes, as probabilistic decisions and interactions are essential for modeling uncertainty in real-world systems. The models include a simple SIS-epidemic model where infected nodes can randomly infect susceptible neighbors or become susceptible again. In this model, susceptible nodes tend to be close to other susceptible nodes and vice versa. This renders network reconstruction comparably simple. In contrast, we also propose an Inverted Voter model (InvVoter) where nodes tend to maximize their disagreement with their neighborhood (influenced by the original Voter model[3]) (cf. Fig. 4). Nodes have one of two opinions (A or B) and nodes in A tend to move to B faster the higher their number of A neighbors and vice versa. To study the emergence of an even more complex pattern, we propose the Majority-Flip dynamics where nodes tend to change their current state when the majority of their neighbors follows the same opinion (regardless of the node’s current state). We refer the reader to Fig. 7 for a visualization of the node-state prediction conditioned. We also examine a rock-paper-scissors (RPS) model to study evolutionary dynamics [37] and the well-known Forest Fire model [2].Finally, we test a deterministic discrete-time dynamical model: a coupled map lattice model (CML) [12, 22, 41] to study graph inference in the presence of chaotic behavior. As the CML model admits real node-values , we performed discretization into equally-spaced bins. For the stochastic models, we use numerical simulations to sample from the equilibrium distribution of the systems. For CML, we randomly sample an initial state and simulate it for a random time period. We do not explicitly add measurement errors but all nodes are subject to internal noise.

Experiment 1: Loss landscape

[height=2.2cm]Exp1New_exp1_sis_exp1x.pdf [height=2.2cm]Exp1New_exp1_inv_voter_exp1x_withoutYlabel.pdf [height=2.2cm]Exp1New_exp1_game_of_life_exp1x_withoutYlabel.pdf [height=2.2cm]Exp1New_exp1_rock_paper_scissors_exp1x_withoutYlabel.pdf

Figure 5: Exp. 1: Computing the loss landscape based on all possible -node graphs and snapshots. -axis: Graph distance to ground truth adjacency matrix. -axis: Mean prediction loss of corresponding graph candidates. Error bars denote min/max-loss.

For this experiment, we generated snapshots corresponding to dynamical models on a the so-called bull graph (as illustrated in Fig. 1). We then trained the network and measured the prediction loss for all potential all adjacency matrices that represent connected graphs. We observe a large dependency between the prediction loss of a candidate matrix and the corresponding graph loss except for the Majority-Flip model. Surprisingly, on larger graphs GINA still finds the ground truth graph with high accuracy given Majority-Flip snapshots compared to the baselines.

Experiment 2: Sample size

xx [height=2.2cm]exp2grid5x5_SIS.pdf [height=2.2cm]exp2grid7x7_SIS_withoutYlabel.pdf [height=2.2cm]exp2gridLegendNew_SIS_withoutYlabel.pdf

Figure 6: Exp. 2: Influence of sample size on SIS-dynamics.

-axis: Epoch number.

-axis: Graph distance between ground truth graph and the state of GINA. Clearly, a higher sample size yields a better performance. CIs are based on 3 independent training runs.

We tested the effect of the number of snapshots on the graph loss using SIS-dynamics. Unsurprisingly, a larger sample size yields a better reconstructing accuracy and the training procedure converges significantly faster. For this experiment, we used a grid graph with and compared results based on snapshots.

Experiment 3: Comparisons with baselines

Graph loss Runtime (s)
Model Graph


(our method)

Corr MI ParCorr GINA Corr MI ParCorr
SIS ER 0 0 0 0
Geom 64 78 84 40
Grid 12 0 0 0
WS 0 0 0 0
Majority-Flip ER 12 72 28 76
Geom 247 1130 616 1168
Grid 511 304 88 304
WS 0 226 22 226
InvVoter ER 0 88 12 88
Geom 0 1692 116 1692
Grid 502 360 90 360
WS 0 226 4 226
RPS ER 0 2 0 0
Geom 151 164 172 2
Grid 75 0 0 0
WS 0 10 10 0
Forest Fire ER 25 6 2 28
Geom 36 852 300 906
Grid 25 2 0 2
WS 0 2 0 8
CML ER 0 0 2 0
Geom 2 0 102 0
Grid 8 0 0 0
WS 0 0 4 0
Table 1: Exp. 3: [Smaller is better] Results of different graph inference methods.

Next, we compare GINA with statistical baselines. We use a mini-batch size of . We start with a sharpness parameter and increase after epochs by one. We train maximally for epochs but stop early if the underlying (binarized) graph does not change for epochs (measured each epochs). Moreover, we use Pytorch’s Adam optimizer with an initial learning rate of .

To generate ground truth graphs we use Erdős-Renyi (ER) (, ), Geometric (Geom) (, ), and Watts–Strogatz (WS) (, ). Moreover, we use a 2-grid graph with nodes (, ). We use thousand samples. Graphs were generated the using networkX package [18] (cf. Appendix A.3 for details).

As statistical baselines, we use Python package netrd [19]. Specifically, we use the correlation (Corr), mutual information (MI), and partial correlation (ParCorr) methods. The baselines only return weighted matrices. Hence, they need to be binarized using a threshold. To find the optimal threshold, we provide them with the number of edges of the ground truth graph. Notably, especially in sparse graphs, this leads to unfair advantage and renders the results not directly comparable. Furthermore, netrd only accepts binary or real-valued node-states. This is a problem for the categorical models RPS and FF. As a solution, we simply map the three node-states to real values (,,), breaking statistical convention. Interestingly, the baselines handle this well and, in most cases, identify the ground truth graph nevertheless.

We also compared our method to the recently proposed AIDD [40] (Results not shown). However, AIDD requires consecutive observations, making it sensitive to the time resolution of the observations. Also, we find that AIDD has a significantly higher run time than GINA (50 thousand samples were not feasible on our machine), which is presumably due to its more complicated architecture, particularly the alternating training of the interaction graph representation and the dynamics predictor.


The results clearly show that graph inference based on independent snapshots is possible and that GINA is a viable alternative to statistical methods. Compared to the baselines, GINA performed best most often, even though we gave the baseline methods the advantage of knowing the ground truth number of edges. GINA performed particularly well in the challenging cases where neighboring nodes do not tend to be in the same (or in similar) node-states. GINA even performed acceptably in the case of CML dynamics despite the discretization and the chaotic and deterministic nature of the process.

6 Conclusions and future work

We propose GINA, a model-free approach to infer the underlying graph structure of a dynamical system from independent observational data. GINA is based on the principle that local interactions among agents manifest themselves in specific local patterns. These patterns can be found and exploited. Our experiments show that the underlying hypothesis is a promising graph inference paradigm and that GINA efficiently solves the task. We believe that the main challenge for future work is to find ways of inferring graphs when the types of interaction differ largely among all edges. Moreover, a deeper theoretical understanding of which processes produce meaningful statistical associations, not only over time but within snapshots would be desirable.

This work was partially funded by the DFG project MULTIMODE. We thank Thilo Krüger for his helpful comments on the manuscript.


  • [1] H. Amini, R. Cont, and A. Minca (2016) Resilience to contagion in financial networks. Mathematical finance 26 (2), pp. 329–365. Cited by: §1.
  • [2] P. Bak, K. Chen, and C. Tang (1990) A forest-fire model and some thoughts on turbulence. Physics letters A 147 (5-6), pp. 297–300. Cited by: §5.
  • [3] A. Campbell, G. Gurin, and W. E. Miller (1954) The voter decides.. Cited by: §5.
  • [4] J. Casadiego, M. Nitzan, S. Hallerberg, and M. Timme (2017) Model-free inference of direct network interactions from nonlinear collective dynamics. Nature communications 8 (1), pp. 1–10. Cited by: §2.
  • [5] T. E. Chan, M. P. Stumpf, and A. C. Babtie (2017) Gene regulatory network inference from single-cell data using multivariate information measures. Cell systems 5 (3), pp. 251–267. Cited by: §1.
  • [6] I. M. de Abril, J. Yoshimoto, and K. Doya (2018) Connectivity inference from neural recording data: challenges, mathematical bases and research directions. Neural Networks 102, pp. 120–137. Cited by: §2.
  • [7] F. Di Lauro, J. Croix, M. Dashti, L. Berthouze, and I. Kiss (2020) Network inference from population-level observation of epidemics. Scientific Reports 10 (1), pp. 1–14. Cited by: §2.
  • [8] P. G. Fennell and J. P. Gleeson (2019) Multistate dynamical processes on networks: analysis through degree-based approximation frameworks. SIAM Review 61 (1), pp. 92–118. Cited by: §A.2, §1.
  • [9] K. R. Finn, M. J. Silk, M. A. Porter, and N. Pinter-Wollman (2019) The use of multilayer network analysis in animal behaviour. Animal behaviour 149, pp. 7–22. Cited by: §1.
  • [10] A. Fornito, A. Zalesky, and M. Breakspear (2015) The connectomics of brain disorders. Nature Reviews Neuroscience 16 (3), pp. 159–172. Cited by: §1.
  • [11] J. Friedman, T. Hastie, and R. Tibshirani (2008)

    Sparse inverse covariance estimation with the graphical lasso

    Biostatistics 9 (3), pp. 432–441. Cited by: §2.
  • [12] P. Garcia, A. Parravano, M. Cosenza, J. Jiménez, and A. Marcano (2002) Coupled map networks as communication schemes. Physical Review E 65 (4), pp. 045201. Cited by: §5.
  • [13] J. P. Gleeson (2011) High-accuracy approximation of binary-state dynamics on networks. Physical Review Letters 107 (6), pp. 068701. Cited by: §1.
  • [14] G. Großmann, M. Backenköhler, J. Klesen, and V. Wolf (2020) Learning vaccine allocation from simulations. In International Conference on Complex Networks and Their Applications, pp. 432–443. Cited by: §1.
  • [15] G. Großmann and L. Bortolussi (2019) Reducing spreading processes on networks to markov population models. In International Conference on Quantitative Evaluation of Systems, pp. 292–309. Cited by: §A.2.
  • [16] S. Gu, F. Pasqualetti, M. Cieslak, Q. K. Telesford, B. Y. Alfred, A. E. Kahn, J. D. Medaglia, J. M. Vettel, M. B. Miller, S. T. Grafton, et al. (2015) Controllability of structural brain networks. Nature communications 6 (1), pp. 1–10. Cited by: §1.
  • [17] A. Hagberg and D. A. Schult (2008) Rewiring networks for synchronization. Chaos: An interdisciplinary journal of nonlinear science 18 (3), pp. 037105. Cited by: §1.
  • [18] A. Hagberg, P. Swart, and D. S Chult (2008) Exploring network structure, dynamics, and function using networkx. Technical report Los Alamos National Lab.(LANL), Los Alamos, NM (United States). Cited by: §A.3, §5.
  • [19] H. Hartle, B. Klein, S. McCabe, A. Daniels, G. St-Onge, C. Murphy, and L. Hébert-Dufresne (2020) Network comparison and the within-ensemble graph distance. Proceedings of the Royal Society A 476 (2243), pp. 20190744. Cited by: §5.
  • [20] S. Hashemifar, B. Neyshabur, A. A. Khan, and J. Xu (2018) Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 34 (17), pp. i802–i810. Cited by: §2.
  • [21] K. Huang, S. Li, P. Dai, Z. Wang, and Z. Yu (2020)

    SDARE: a stacked denoising autoencoder method for game dynamics network structure reconstruction

    Neural Networks 126, pp. 143–152. Cited by: §2.
  • [22] K. Kaneko (1992) Overview of coupled map lattices. Chaos: An Interdisciplinary Journal of Nonlinear Science 2 (3), pp. 279–282. Cited by: §A.2, §5.
  • [23] T. Kipf, E. Fetaya, K. Wang, M. Welling, and R. Zemel (2018) Neural relational inference for interacting systems. In International Conference on Machine Learning, pp. 2688–2697. Cited by: §1, §1, §1, §2.
  • [24] K. Kishan, R. Li, F. Cui, Q. Yu, and A. R. Haake (2019) GNE: a deep learning framework for gene network inference by aggregating biological information. BMC systems biology 13 (2), pp. 38. Cited by: §2.
  • [25] I. Z. Kiss, J. C. Miller, P. L. Simon, et al. (2017) Mathematics of epidemics on networks. Cham: Springer 598. Cited by: §A.2.
  • [26] J. A. Martínez, O. Cerri, M. Spiropulu, J. Vlimant, and M. Pierini (2019) Pileup mitigation at the large hadron collider with graph neural networks. The European Physical Journal Plus 134 (7), pp. 333. Cited by: §1.
  • [27] R. M. May (2004) Simple mathematical models with very complicated dynamics. The Theory of Chaotic Attractors, pp. 85–93. Cited by: §A.2.
  • [28] R. Memmesheimer and M. Timme (2006) Designing complex networks. Physica D: Nonlinear Phenomena 224 (1-2), pp. 182–201. Cited by: §1.
  • [29] M. E. Newman (2018) Estimating network structure from unreliable measurements. Physical Review E 98 (6), pp. 062321. Cited by: §2.
  • [30] N. Omranian, J. M. Eloundou-Mbebi, B. Mueller-Roeber, and Z. Nikoloski (2016) Gene regulatory network inference using fused lasso on multiple data sets. Scientific reports 6 (1), pp. 1–14. Cited by: §2.
  • [31] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703. Cited by: §5.
  • [32] B. A. Prakash, J. Vreeken, and C. Faloutsos (2012) Spotting culprits in epidemics: how many and which ones?. In 2012 IEEE 12th International Conference on Data Mining, pp. 11–20. Cited by: §1.
  • [33] B. Prasse and P. Van Mieghem (2018) Maximum-likelihood network reconstruction for sis processes is np-hard. arXiv preprint arXiv:1807.08630. Cited by: §3.
  • [34] B. Prasse and P. Van Mieghem (2020) Network reconstruction and prediction of epidemic outbreaks for general group-based compartmental epidemic models. IEEE Transactions on Network Science and Engineering. Cited by: §2.
  • [35] P. Rossini, R. Di Iorio, M. Bentivoglio, G. Bertini, F. Ferreri, C. Gerloff, R. Ilmoniemi, F. Miraglia, M. Nitsche, F. Pestilli, et al. (2019) Methods for analysis of brain connectivity: an ifcn-sponsored review. Clinical Neurophysiology 130 (10), pp. 1833–1858. Cited by: §1.
  • [36] S. Sarraf and J. Sun (2016) Advances in functional brain imaging: a comprehensive survey for engineers and physical scientists. International Journal of Advanced Research 4 (8), pp. 640–660. Cited by: §1.
  • [37] G. Szabó and G. Fath (2007) Evolutionary games on graphs. Physics reports 446 (4-6), pp. 97–216. Cited by: §5.
  • [38] R. Tibshirani (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §2.
  • [39] H. Zhang, F. Xu, Z. Bao, and C. Ma (2018) Reconstructing of networks with binary-state dynamics via generalized statistical inference. IEEE Transactions on Circuits and Systems I: Regular Papers 66 (4), pp. 1608–1619. Cited by: §1.
  • [40] Y. Zhang, Y. Guo, Z. Zhang, M. Chen, S. Wang, and J. Zhang (2021) Automated discovery of interactions and dynamics for large networked dynamical systems. arXiv preprint arXiv:2101.00179. Cited by: §2, §5.
  • [41] Z. Zhang, Y. Zhao, J. Liu, S. Wang, R. Tao, R. Xin, and J. Zhang (2019) A general deep learning framework for network reconstruction and dynamics learning. Applied Network Science 4 (1), pp. 1–17. Cited by: §A.2, §1, §1, §2, §5.
  • [42] M. Zitnik, M. Agrawal, and J. Leskovec (2018) Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34 (13), pp. i457–i466. Cited by: §1.


The checklist follows the references. Please read the checklist guidelines carefully for information on how to answer these questions. For each question, change the default to , , or . You are strongly encouraged to include a justification to your answer, either by referencing the appropriate section of your paper or providing a brief inline description. For example:

  • Did you include the license to the code and datasets? …

  • Did you include the license to the code and datasets? The code and the data are proprietary.

  • Did you include the license to the code and datasets?

Please do not modify the questions and only use the provided macros for your answers. Note that the Checklist section does not count towards the page limit. In your paper, please delete this instructions block and only keep the Checklist section heading above along with the questions/answers below.

  1. For all authors…

    1. Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?

    2. Did you describe the limitations of your work? See paragraph Limitations in Section 4

    3. Did you discuss any potential negative societal impacts of your work?

    4. Have you read the ethics review guidelines and ensured that your paper conforms to them?

  2. If you are including theoretical results…

    1. Did you state the full set of assumptions of all theoretical results? We provide no theoretical results.

    2. Did you include complete proofs of all theoretical results?

  3. If you ran experiments…

    1. Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? See the GitHub repo. Reproduction might be subject to statistical noise.

    2. Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? See Section


    3. Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Results are based on a single training iteration.

    4. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? We report he machine environment and runtime in paragraph Experiment 3: Comparisons with baselines and Table 1.

  4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets…

    1. If your work uses existing assets, did you cite the creators?

    2. Did you mention the license of the assets?

    3. Did you include any new assets either in the supplemental material or as a URL?

    4. Did you discuss whether and how consent was obtained from people whose data you’re using/curating?

    5. Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content?

  5. If you used crowdsourcing or conducted research with human subjects…

    1. Did you include the full text of instructions given to participants and screenshots, if applicable?

    2. Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable?

    3. Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation?

Appendix A Appendix

a.1 Prediction layer visualization


Figure 7: Output of the prediction layer for a random node in the Watts–Strogatz network. The predicted probability for the node having state S in the SIS model or state A in the Inverted Voter and Majority-Flip are given.

We can visualize the prediction layer for the 2-state models. The input of the prediction layer of node is the (relaxed) 2-dimensional neighborhood counting vector . The probability distribution over the two node-states is fully specified by one probability. In Fig. 7, we visualize these probabilities given by the prediction layer of all possible neighborhood counting vectors (for nodes with degree ). The results are given for a Watts–Strogatz graph and the same node for all three models. We observe how the prediction layer captures a roughly linear boundary for the SIS and the Inverted Voter model. The majority flip dynamic leads to a more intricate predictive structure.

We see that the network successfully learns to represent complicated conditional probability distributions of node-states.

a.2 Dynamical models

Except for the CML, we use continuous time stochastic processes with a discrete state-space to generate snapshots. Specifically, these models have Markov jump process or continuous-time Markov chain (CTMC) semantics. Moreover, each node/agent occupies one of several node-states (denoted

) at each point in time. Nodes change their state stochastically according to their neighborhood (more precisely, to their neighborhood counting vector). We assume that all nodes obey the same rules/local behavior. We refer the reader to [25, 8, 15] for a detailed description of the CTMC construction in the context of epidemic spreading processes.


Nodes are susceptible (S) or infected (I). Infected nodes can infect their susceptible neighbors or spontaneously become susceptible again. In other words, I-nodes become S-nodes with a fixed reaction rate and S-nodes become I-nods with a rate , where denotes the number of infected neighbors of the node and is the reaction rate parameter. Moreover, for all models, we add a small amount of stochastic noise to the dynamics. The noise not only emulates measurement errors but also prevents the system from getting stuck in trap state where no rule is applicable (e.g., all nodes are susceptible).

In the sequel, we use the corresponding notation

The reaction rate refers to the exponentially distributed residence times and a higher rate is associated with a faster state transition. When the reaction rate is zero (e.g., when no neighbors are infected), the state transition is impossible.

The parameterization is , , and .

Inverted Voter

The Inverted Voter models two competing opinions (A and B) while nodes always tend to maximize their disagreement with their neighbors.

We use .


Majority-Flip models two competing opinions while nodes tend to flip (i.e., change) their state when the majority of their neighbors follow are in the same state. A light asymmetry makes the problem solvable.

The indicator variables identifies what counts as a majority. They are defined as

If the indicator is not one, it is zero. We use .

Rock Paper Scissors

provides a simple evolutionary dynamics where three species overtake each other in a ring-like relationship.

We use .

Forest Fire

Spots/nodes are either empty (E), on fire (F), or have a tree on them (F). Trees grow with a growth rate . Random lightning starts a fired on tree-nodes with rate . The fire on a node goes extinct with rate leaving the node empty. Finally, fire spreads to neighboring tree-nodes with rate .

The parameterization is , , , , and .

Coupled Map Lattice

Let be the value of node at time-step . Each node starts with a random value (uniform in ). At each time step all nodes are updated based on a linear combination of the node-value and the node-values of neighboring nodes [22]:

where is the degree of , denotes the set of (indices of) nodes adjacent to , is the coupling strength and is the local map. Like [41], we use the logistic function [27]:

where modulates the complexity of the dynamics.

We use and .

a.3 Random interaction graphs

We use the Python NetworkX [18] package to generate a single instance (variate) of a random graph model and test the GINA and the baselines and a large number of snapshots generated using this graph. In particular, we use Erdős-Renyi (ER) (, ) graph model with connection probability :

    nx.erdos_renyi_graph(25, 0.15, seed=43)

(the 43 seed was used in order to create a connected graph). We also use Geometric Graph (, ):

    nx.random_geometric_graph(200, 0.125, seed=42)

and a Watts–Strogatz graph (, ) where each node has 4 neighbors and the re-wiring probability is :

    nx.newman_watts_strogatz_graph(50, 4, 0.15, seed=42)

After generation, the node-ids are randomly shuffled in order to guarantee that they do not leak information about connectivity to the training model.