1 Introduction
Complex interacting systems are ubiquitous in nature and society. However, their analysis remains challenging. Traditionally, the analysis of complex systems is based on models of individual components. This reductionist perspective reaches its limitations when the interactions of the individual components—not the components themselves—become the dominant force behind a system’s dynamical evolution. Understanding a system’s functional organization given such data is relevant for the analysis [10, 32, 1, 9], design [42, 17, 28], control [16, 14], and prediction [23, 41] of many complex phenomena that are constrained by a graph topology or contact network.
Here, we focus on the internal interaction structure (i.e., graph or network) of a complex system. We propose a machine learning approach to infer this structure based on observational data of the individual components or constituent agents (i.e., nodes). We refer to these observations as
snapshots and assume that the observable states of all components are measured simultaneously. However, we make no prior assumption about the relationship between snapshots. Specifically, measurements may be taken from different experiments with varying initial conditions and at arbitrary time points.Most recent work focuses on time series data, where observations are timecorrelated and the interaction graph is inferred from the joint time evolution of the nodestates [41, 23]. Naturally, time series data typically contains more information on the system’s interaction than snapshot data. However, in many cases, such data is not available. For instance, in some cases, one has to destroy a system to access their components (e.g., slice a brain [35], observe a quantum system [26], or terminate a cell [5]). Sometimes, the relevant time scale of the system is too small (e.g., in particle physics) or too large (e.g., in evolutionary dynamics) to be observed. Often, there is a tradeoff between spatial and temporal resolution of a measurement [36]. In addition, measurements may be temporally decoupled due to large observation intervals and thus become unsuitable for methods that exploit correlations in time. Yet, machine learning techniques for graph inference from independent data remain underexplored in the literature.
In contrast to many stateoftheart approaches, our approach is modelfree without any assumptions on the dynamical laws. Conceptually, our analysis is founded on identifying patterns in snapshots We exploit that identified patterns are likely the result of probabilistic local interactions and, thus, carry information about the underlying connectivity.
We propose an approach based on ideas recently popularized for time series based network inference [39, 23]. It provides an elegant way of formalizing the graph inference problem with minimal parametric assumptions on the underlying dynamical model. The core assumption is that the interaction graph best describing the observed data is the ground truth.
In the time series setting, this would be the graph that best enables time series forecasting. Instead, we aim at finding the graph that best enables us to predict a node’s state based on its direct neighborhood within a snapshot and not at future times. To this end, we use a prediction model to learn a node’s observable state (in the sequel: simply state or nodestate) given the joint state of all adjacent nodes. Then we maximize the prediction accuracy by jointly optimizing the interaction graph and the prediction model. Loosely speaking, we assume that the information shared between a node and its complete neighborhood is higher in the ground truth graph than in other potential interaction typologies.
However, in a trivial implementation, the network that enables the best nodestate prediction is the complete graph because it provides all information available in a given snapshot. This necessitates a form of regularization in which edges that are present, but not necessary, reduce prediction quality. We do this using a neighborhood aggregation scheme that acts as a bottleneck of information flow. We count the number of neighbors of a node
in each nodestate and use these counts to predict state probabilities of
. Thus, the prediction model represents the conditional probability of a nodestate given an aggregation of the complete neighborhood. Hence, irrelevant neighbors reduce the information content of the counting abstraction and consequently the prediction quality. This neighborhood aggregating provides a powerful description for many relevant dynamical models of interacting systems [13, 8].The nodestates can directly relate to the measurements or be the result of an embedding or a discretization. For the prediction, we use a simple nodespecific multilayer perception (MLP). This task can be framed as a simple layer graph neural network (GNN) architecture.
For an efficient solution to the graph inference problem, we propose GINA (Graph Inference Network Architecture). GINA tackles this problem by simultaneously optimizing the interaction graph and the dynamics. The representation employs a computationally simple neighborhood aggregation and an efficient weightsharing mechanism between nodes. In combination with a differentiable graph representation, this enables the application of our methods to systems with hundreds of nodes.
In summary, we formalize and test the hypothesis that network reconstruction can be formulated as a prediction task. This contribution includes: (i) we propose a suitable neighborhood aggregation mechanism, (ii) we propose the neural architecture GINA to efficiently solve the prediction and reconstruction task, and (iii) we evaluate GINA on synthetically generated snapshots using various combinations of graphs and diffusion models.
2 Related work
Literature abounds with methods to infer the (latent) functional organization of complex systems that is often expressed using (potentially weighted, directed, temporal, multi) graphs.
Most relevant for this work are previous approaches that use deep learning on time series data to learn the interaction dynamics and the interaction structure. Zhang et al. propose a twocomponent GNN architecture where a graph generator network propose interaction graphs and a dynamics learner learns to predict the dynamical evolution using the interaction graph
[41, 40]. Both components are trained alternately. Similarly, Kipf et al. learn the dynamics using an encoderdecoder architecture that is constrained by an interaction graph which is optimized simultaneously [23]. Huang et al. use a compressionbased method for this purpose [21]. Another stateoftheart approach for this problem, based on regression analysis instead of deep learning, is the ARNI framework by Casadiego et al.
[4]. This method, however, requires timeseries data and hinges on a good choice of basis functions.Other methods to infer interaction structures aim at specific dynamical models and applications. Examples include epidemic contagion [29, 7, 34], gene networks [24, 30], functional brain network organization [6], and proteinprotein interactions [20]. In contrast, our approach assumes no prior knowledge about the laws that govern the system’s evolution.
Statistical methods provide an equally viable and often very robust alternative. These can be based on partial correlation, mutual information, or graphical lasso [38, 11]. Here, we not only rely on pairwise correlations among components but also the joint impact of all neighboring components, which is necessary in the presence of nonlinear dynamical laws governing the system. Moreover, we directly infer binary (i.e., unweighted) graphs in order to not rely on (often unprincipled) threshold mechanisms.
3 Foundations and problem formulation
The goal is to find the latent interaction graph of a complex system with agents/nodes. A graph is represented as an adjacency matrix of dimension (with node set ), where indicates the presence () or absence () of an edge between node and . We assume that is symmetric (the graph is undirected) and the diagonal entries are all zero (the graph has no selfloops). We use to denote the ground truth matrix.
Each snapshot assigns a nodestate to each node. The finite set of possible nodestates is denoted
. For convenience, we assume that the nodestates are represented using onehot encoding. For instance, in an epidemic model nodes might be susceptible or infected. Since there are two nodestates, we use
. Each snapshot can then conveniently be represented as a matrix with rows, where each row describes the corresponding onehot encoded nodestate (cf. Fig. 2 left). We use to denote the set of independent snapshots. We make no specific assumption about the underlying distribution or process behind the snapshots or their relationship to another.For a node (and fixed snapshot), we use to denote the (elementwise) sum of all neighboring nodestates, referred to as neighborhood counting vector. For example, consider again the case : refers to a node with susceptible () and infected () neighbors. Note that the sum over is the degree (i.e., number of neighbors) of that node (here,
). The set of all possible neighborhood counting vectors is denoted by
. We can compute neighborhood counting vectors of all nodes using , where the th row of equals (cf. Fig. 2 center).In the next step, we feed each counting vector in a machine learning model that predicts the original state of in that snapshot. Specifically, we learn a function , where
denotes the set of all probability distributions over
(in the sequel, we make the mapping to a probability distribution explicit by adding a Softmax function). To evaluate, we use some loss function to quantify how well the distribution predicts the true state and minimize this
prediction loss. We assume that is fully parameterized by a nodedependent weight matrix . All weights in a network are given by the list . We also define as a nodewise application of , that is,The hypothesis is that the ground truth adjacency matrix provides the best foundation for , ultimately leading to the smallest prediction loss. Under this hypothesis, we can use the loss as a surrogate for the accuracy of candidate (compared to the ground truth graph). The number of possible adjacency matrices of a system with nodes (assuming no selfloops and symmetries) is . Thus, it is hopeless to simply try all possible adjacency matrices. Hence, we need to strengthen this hypothesis and assume that smaller distances between and (we call this graph loss) lead to smaller prediction losses (in a reasonable sense). This way, the optimization becomes feasible and we can follow the surrogate loss in order to arrive at .
Graph neural network
Next, we formulate the graph inference problem using a graph neural network
that loosely resembles an autoencoder architecture: In each snapshot
X, we predict the nodestate of each node using only the neighborhood of that node. Then, we compare the prediction with the actual (known) nodestate.For a given adjacency matrix (graph) and list of weight matrices , we define the GNN , applied to a snapshot as:
where is applied rowwise. Thus, results in a matrix where each row corresponds to a node and models a distribution over nodestates. Similar to the autoencoder paradigm, input and output are of the same form and the network learns to minimize the difference (note that the onehot encoding can also be viewed as a valid probability distribution over nodestates). The absence of selfloops in is critical as it means that the actual nodestate of a node is not part of their own neighborhood aggregation. As we want to predict a node’s state, the state itself cannot be part of the input.
We will refer to the matrix multiplication as graph layer and to the application of as prediction layer. We only perform a single application of the graph layer on purpose, which means that only information from the immediate neighborhood can be used to predict a nodestate. While using hop neighborhoods would increase the network’s predictive power it would be detrimental to graph reconstruction.
Most modern GNN architectures are written as in the messagepassing scheme, where each layer performs an aggregate and a combine step. The aggregate step computes a neighborhood embedding based on a permutationinvariant function of all neighboring nodes. The combine step combines this embedding with the actual nodestate. In our architecture, aggregation is the elementwise sum. The combination, however, needs to purposely ignore the nodestate (in order for the prediction task to make sense) and applies .
Prediction loss
We assume a loss function that is applied independently for each snapshot:
The prediction loss compares the input (actual nodestates) and output (predicted nodestates) of the GNN . We define the loss on a set of independent snapshots as the sum over all constituent snapshots:
In our experiments, we use rowwise MSEloss.
Note that, in the above sum, all snapshots are treated equally independent of their corresponding initial conditions or time points at which they were made (which we do not know anyway).
Graph inference problem
We define the graph inference problem as follows: For given set of snapshots (corresponding to nodes), find an adjacency matrix and list of weight matrices minimizing the prediction loss:
Note that, in general we cannot guarantee that is equal to the ground truth matrix . Regarding the computational complexity, it is known that network reconstruction for epidemic models based on time series data is hard when formulated as a decision problem [33]. We believe that this carries over to our setting but leave a proof for future work.
4 Our method: Gina
As explained in the previous section, it is not possible to solve the graph inference problem by iterating over all possible graphs/weights. Hence, we propose GINA (Graph Inference Network Architecture). GINA efficiently approximates the graph inference problem by jointly optimizes the graph and the predictor layer weights .
Therefore, we adopt two tricks: Firstly, we impose a relaxation on the graph adjacency matrix representing its entries as realvalued numbers. Secondly, we use shared weights in the weight matrices belonging to different nodes. Specifically, each node gets its custom MLP, but weights of all layers, except the last one, are shared among nodes. This allows us to simultaneously optimize the graph and the weights using backpropagation. Apart from that, we still follow the architecture from the previous section. That is, a network layer maps a snapshot to neighborhood counting vectors, and each neighborhood counting vector is pushed through a nodebased MLP.
Graph layer
Internally, we store the interaction graph as an upper triangular matrix C. In each step, we (i) compute to enforce symmetry, (ii) compute , and (iii) set all diagonal entries of to zero. Here, is a differential function that is applied elementwise and maps realvalued entries to the interval . It ensures that approximately behaves like a binary adjacency matrix while remaining differentiable (the hatnotation indicates relaxation). Specifically, we adopt a Sigmoidtype function that is parametrized by a sharpness parameter :
where we choose and increase the sharpness of the step using over the course of the training. Finally, the graph layer matrix is multiplied with the snapshot (i.e., ) and yields a relaxed version of the neighborhood counting vectors.
Prediction layer
In , each row corresponds to one node. Thus, we apply the MLPs independently to each row. We use to denote the row corresponding to node (i.e., the neighborhood counting relaxation of ). Let denote a fullyconnected (i.e., linear) layer with input (resp. output) dimension (resp. ). We use a ReLu
and Softmax activation function. The whole prediction MLP contains four sublayers and is given as:
Only the last sublayer contains nodespecific weights. This layer enables a nodespecific shift of the probability computed in the previous layer. Note that we use a comparably small dimension (i.e., 10) for internal embeddings, which has shown to be sufficient in our experiments.
Training
We empirically find that overfitting is not a problem and therefore do not use a test set. A natural approach would be to split the snapshots into a training and test set and optimize and on the training set until the loss reaches a minimum on the test set. Specifically, the loss on the test set provides the best surrogate for the distance to the ground truth graph. Another important aspect during training is the usage of minibatches. For ease of notation, we ignored batches so far. In practice, minibatches are crucial for fast and robust training. A minibatch of size , can be created by concatenating snapshots (in the graph layer) and reordering the rows accordingly (in the prediction layer).
Limitations
There are some relevant limitations to GINA. Firstly, we can provide no guarantees that the ground truth graph is actually the solution to the graph inference problem. In particular, simple patterns in the time domain (that enable trivial graph inference using time series data) might correspond to highly nonlinear patterns inside a single snapshot. Moreover, GINA is only applicable if statistical associations among adjacent nodes manifest themselves in a way that renders the counting abstraction meaningful. Statistical methods are more robust in the way they can handle different types of pairwise interactions but less powerful regarding nonlinear combined effects of the complete neighborhood. Another relevant design decision is to use onehot encoding which renders the forward pass extremely fast but will reach limitations when the nodestate domain becomes very complex. Together with relational homogeneity, we also assume that all agents behave reasonably similar to another which enables weight sharing and therefore greatly increases the efficiency of the training and reduces the number of required samples.
5 Experiments
We conduct three experiments using synthetically generated snapshots. In Experiment 1, we analyze the underlying hypothesis that the ground truth graph enables the best nodestate prediction. In Experiment 2, we study the importance of sample size for the reconstruction accuracy, and in Experiment 3, we compare GINA to statistical baselines. Our prototype of GINA
is implemented using PyTorch
[31]and is executed on a standard desktop computer with an NVIDIA GeForce RTX 3080, 32 GB of RAM, and an Intel i910850K CPU. Open source code (GNU GPLv3) is available at GitHub
^{1}^{1}1github.com/GerritGr/GINA.To measure the quality of an inferred graph, we compute the distance to the ground truth graph. We define this graph loss as the (Manhattan) distance of the upper triangular parts of the two adjacency matrices (i.e., the number of edges to add/remove). We always use a binarized version of for comparison with . All results are based on a single run of GINA, performing multiple runs and using the result with the lower prediction loss, might further improve GINA’s performance.
Dynamical models
We study six models. A precise description of dynamics and parameters are provided in Appendix A.2. We focus on stochastic processes, as probabilistic decisions and interactions are essential for modeling uncertainty in realworld systems. The models include a simple SISepidemic model where infected nodes can randomly infect susceptible neighbors or become susceptible again. In this model, susceptible nodes tend to be close to other susceptible nodes and vice versa. This renders network reconstruction comparably simple. In contrast, we also propose an Inverted Voter model (InvVoter) where nodes tend to maximize their disagreement with their neighborhood (influenced by the original Voter model[3]) (cf. Fig. 4). Nodes have one of two opinions (A or B) and nodes in A tend to move to B faster the higher their number of A neighbors and vice versa. To study the emergence of an even more complex pattern, we propose the MajorityFlip dynamics where nodes tend to change their current state when the majority of their neighbors follows the same opinion (regardless of the node’s current state). We refer the reader to Fig. 7 for a visualization of the nodestate prediction conditioned. We also examine a rockpaperscissors (RPS) model to study evolutionary dynamics [37] and the wellknown Forest Fire model [2].Finally, we test a deterministic discretetime dynamical model: a coupled map lattice model (CML) [12, 22, 41] to study graph inference in the presence of chaotic behavior. As the CML model admits real nodevalues , we performed discretization into equallyspaced bins. For the stochastic models, we use numerical simulations to sample from the equilibrium distribution of the systems. For CML, we randomly sample an initial state and simulate it for a random time period. We do not explicitly add measurement errors but all nodes are subject to internal noise.
Experiment 1: Loss landscape
For this experiment, we generated snapshots corresponding to dynamical models on a the socalled bull graph (as illustrated in Fig. 1). We then trained the network and measured the prediction loss for all potential all adjacency matrices that represent connected graphs. We observe a large dependency between the prediction loss of a candidate matrix and the corresponding graph loss except for the MajorityFlip model. Surprisingly, on larger graphs GINA still finds the ground truth graph with high accuracy given MajorityFlip snapshots compared to the baselines.
Experiment 2: Sample size
We tested the effect of the number of snapshots on the graph loss using SISdynamics. Unsurprisingly, a larger sample size yields a better reconstructing accuracy and the training procedure converges significantly faster. For this experiment, we used a grid graph with and compared results based on snapshots.
Experiment 3: Comparisons with baselines
Graph loss  Runtime (s)  

Model  Graph 
GINA (our method) 
Corr  MI  ParCorr  GINA  Corr  MI  ParCorr 
SIS  ER  0  0  0  0  
Geom  64  78  84  40  
Grid  12  0  0  0  
WS  0  0  0  0  
MajorityFlip  ER  12  72  28  76  
Geom  247  1130  616  1168  
Grid  511  304  88  304  
WS  0  226  22  226  
InvVoter  ER  0  88  12  88  
Geom  0  1692  116  1692  
Grid  502  360  90  360  
WS  0  226  4  226  
RPS  ER  0  2  0  0  
Geom  151  164  172  2  
Grid  75  0  0  0  
WS  0  10  10  0  
Forest Fire  ER  25  6  2  28  
Geom  36  852  300  906  
Grid  25  2  0  2  
WS  0  2  0  8  
CML  ER  0  0  2  0  
Geom  2  0  102  0  
Grid  8  0  0  0  
WS  0  0  4  0 
Next, we compare GINA with statistical baselines. We use a minibatch size of . We start with a sharpness parameter and increase after epochs by one. We train maximally for epochs but stop early if the underlying (binarized) graph does not change for epochs (measured each epochs). Moreover, we use Pytorch’s Adam optimizer with an initial learning rate of .
To generate ground truth graphs we use ErdősRenyi (ER) (, ), Geometric (Geom) (, ), and Watts–Strogatz (WS) (, ). Moreover, we use a 2grid graph with nodes (, ). We use thousand samples. Graphs were generated the using networkX package [18] (cf. Appendix A.3 for details).
As statistical baselines, we use Python package netrd [19]. Specifically, we use the correlation (Corr), mutual information (MI), and partial correlation (ParCorr) methods. The baselines only return weighted matrices. Hence, they need to be binarized using a threshold. To find the optimal threshold, we provide them with the number of edges of the ground truth graph. Notably, especially in sparse graphs, this leads to unfair advantage and renders the results not directly comparable. Furthermore, netrd only accepts binary or realvalued nodestates. This is a problem for the categorical models RPS and FF. As a solution, we simply map the three nodestates to real values (,,), breaking statistical convention. Interestingly, the baselines handle this well and, in most cases, identify the ground truth graph nevertheless.
We also compared our method to the recently proposed AIDD [40] (Results not shown). However, AIDD requires consecutive observations, making it sensitive to the time resolution of the observations. Also, we find that AIDD has a significantly higher run time than GINA (50 thousand samples were not feasible on our machine), which is presumably due to its more complicated architecture, particularly the alternating training of the interaction graph representation and the dynamics predictor.
Discussion
The results clearly show that graph inference based on independent snapshots is possible and that GINA is a viable alternative to statistical methods. Compared to the baselines, GINA performed best most often, even though we gave the baseline methods the advantage of knowing the ground truth number of edges. GINA performed particularly well in the challenging cases where neighboring nodes do not tend to be in the same (or in similar) nodestates. GINA even performed acceptably in the case of CML dynamics despite the discretization and the chaotic and deterministic nature of the process.
6 Conclusions and future work
We propose GINA, a modelfree approach to infer the underlying graph structure of a dynamical system from independent observational data. GINA is based on the principle that local interactions among agents manifest themselves in specific local patterns. These patterns can be found and exploited. Our experiments show that the underlying hypothesis is a promising graph inference paradigm and that GINA efficiently solves the task. We believe that the main challenge for future work is to find ways of inferring graphs when the types of interaction differ largely among all edges. Moreover, a deeper theoretical understanding of which processes produce meaningful statistical associations, not only over time but within snapshots would be desirable.
This work was partially funded by the DFG project MULTIMODE. We thank Thilo Krüger for his helpful comments on the manuscript.
References
 [1] (2016) Resilience to contagion in financial networks. Mathematical finance 26 (2), pp. 329–365. Cited by: §1.
 [2] (1990) A forestfire model and some thoughts on turbulence. Physics letters A 147 (56), pp. 297–300. Cited by: §5.
 [3] (1954) The voter decides.. Cited by: §5.
 [4] (2017) Modelfree inference of direct network interactions from nonlinear collective dynamics. Nature communications 8 (1), pp. 1–10. Cited by: §2.
 [5] (2017) Gene regulatory network inference from singlecell data using multivariate information measures. Cell systems 5 (3), pp. 251–267. Cited by: §1.
 [6] (2018) Connectivity inference from neural recording data: challenges, mathematical bases and research directions. Neural Networks 102, pp. 120–137. Cited by: §2.
 [7] (2020) Network inference from populationlevel observation of epidemics. Scientific Reports 10 (1), pp. 1–14. Cited by: §2.
 [8] (2019) Multistate dynamical processes on networks: analysis through degreebased approximation frameworks. SIAM Review 61 (1), pp. 92–118. Cited by: §A.2, §1.
 [9] (2019) The use of multilayer network analysis in animal behaviour. Animal behaviour 149, pp. 7–22. Cited by: §1.
 [10] (2015) The connectomics of brain disorders. Nature Reviews Neuroscience 16 (3), pp. 159–172. Cited by: §1.

[11]
(2008)
Sparse inverse covariance estimation with the graphical lasso
. Biostatistics 9 (3), pp. 432–441. Cited by: §2.  [12] (2002) Coupled map networks as communication schemes. Physical Review E 65 (4), pp. 045201. Cited by: §5.
 [13] (2011) Highaccuracy approximation of binarystate dynamics on networks. Physical Review Letters 107 (6), pp. 068701. Cited by: §1.
 [14] (2020) Learning vaccine allocation from simulations. In International Conference on Complex Networks and Their Applications, pp. 432–443. Cited by: §1.
 [15] (2019) Reducing spreading processes on networks to markov population models. In International Conference on Quantitative Evaluation of Systems, pp. 292–309. Cited by: §A.2.
 [16] (2015) Controllability of structural brain networks. Nature communications 6 (1), pp. 1–10. Cited by: §1.
 [17] (2008) Rewiring networks for synchronization. Chaos: An interdisciplinary journal of nonlinear science 18 (3), pp. 037105. Cited by: §1.
 [18] (2008) Exploring network structure, dynamics, and function using networkx. Technical report Los Alamos National Lab.(LANL), Los Alamos, NM (United States). Cited by: §A.3, §5.
 [19] (2020) Network comparison and the withinensemble graph distance. Proceedings of the Royal Society A 476 (2243), pp. 20190744. Cited by: §5.
 [20] (2018) Predicting protein–protein interactions through sequencebased deep learning. Bioinformatics 34 (17), pp. i802–i810. Cited by: §2.

[21]
(2020)
SDARE: a stacked denoising autoencoder method for game dynamics network structure reconstruction
. Neural Networks 126, pp. 143–152. Cited by: §2.  [22] (1992) Overview of coupled map lattices. Chaos: An Interdisciplinary Journal of Nonlinear Science 2 (3), pp. 279–282. Cited by: §A.2, §5.
 [23] (2018) Neural relational inference for interacting systems. In International Conference on Machine Learning, pp. 2688–2697. Cited by: §1, §1, §1, §2.
 [24] (2019) GNE: a deep learning framework for gene network inference by aggregating biological information. BMC systems biology 13 (2), pp. 38. Cited by: §2.
 [25] (2017) Mathematics of epidemics on networks. Cham: Springer 598. Cited by: §A.2.
 [26] (2019) Pileup mitigation at the large hadron collider with graph neural networks. The European Physical Journal Plus 134 (7), pp. 333. Cited by: §1.
 [27] (2004) Simple mathematical models with very complicated dynamics. The Theory of Chaotic Attractors, pp. 85–93. Cited by: §A.2.
 [28] (2006) Designing complex networks. Physica D: Nonlinear Phenomena 224 (12), pp. 182–201. Cited by: §1.
 [29] (2018) Estimating network structure from unreliable measurements. Physical Review E 98 (6), pp. 062321. Cited by: §2.
 [30] (2016) Gene regulatory network inference using fused lasso on multiple data sets. Scientific reports 6 (1), pp. 1–14. Cited by: §2.
 [31] (2019) Pytorch: an imperative style, highperformance deep learning library. arXiv preprint arXiv:1912.01703. Cited by: §5.
 [32] (2012) Spotting culprits in epidemics: how many and which ones?. In 2012 IEEE 12th International Conference on Data Mining, pp. 11–20. Cited by: §1.
 [33] (2018) Maximumlikelihood network reconstruction for sis processes is nphard. arXiv preprint arXiv:1807.08630. Cited by: §3.
 [34] (2020) Network reconstruction and prediction of epidemic outbreaks for general groupbased compartmental epidemic models. IEEE Transactions on Network Science and Engineering. Cited by: §2.
 [35] (2019) Methods for analysis of brain connectivity: an ifcnsponsored review. Clinical Neurophysiology 130 (10), pp. 1833–1858. Cited by: §1.
 [36] (2016) Advances in functional brain imaging: a comprehensive survey for engineers and physical scientists. International Journal of Advanced Research 4 (8), pp. 640–660. Cited by: §1.
 [37] (2007) Evolutionary games on graphs. Physics reports 446 (46), pp. 97–216. Cited by: §5.
 [38] (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §2.
 [39] (2018) Reconstructing of networks with binarystate dynamics via generalized statistical inference. IEEE Transactions on Circuits and Systems I: Regular Papers 66 (4), pp. 1608–1619. Cited by: §1.
 [40] (2021) Automated discovery of interactions and dynamics for large networked dynamical systems. arXiv preprint arXiv:2101.00179. Cited by: §2, §5.
 [41] (2019) A general deep learning framework for network reconstruction and dynamics learning. Applied Network Science 4 (1), pp. 1–17. Cited by: §A.2, §1, §1, §2, §5.
 [42] (2018) Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34 (13), pp. i457–i466. Cited by: §1.
Checklist
The checklist follows the references. Please read the checklist guidelines carefully for information on how to answer these questions. For each question, change the default to , , or . You are strongly encouraged to include a justification to your answer, either by referencing the appropriate section of your paper or providing a brief inline description. For example:

Did you include the license to the code and datasets? …

Did you include the license to the code and datasets? The code and the data are proprietary.

Did you include the license to the code and datasets?
Please do not modify the questions and only use the provided macros for your answers. Note that the Checklist section does not count towards the page limit. In your paper, please delete this instructions block and only keep the Checklist section heading above along with the questions/answers below.

For all authors…

Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?

Did you describe the limitations of your work? See paragraph Limitations in Section 4

Did you discuss any potential negative societal impacts of your work?

Have you read the ethics review guidelines and ensured that your paper conforms to them?


If you are including theoretical results…

Did you state the full set of assumptions of all theoretical results? We provide no theoretical results.

Did you include complete proofs of all theoretical results?


If you ran experiments…

Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? See the GitHub repo. Reproduction might be subject to statistical noise.

Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? See Section
5 
Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Results are based on a single training iteration.

Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? We report he machine environment and runtime in paragraph Experiment 3: Comparisons with baselines and Table 1.


If you are using existing assets (e.g., code, data, models) or curating/releasing new assets…

If your work uses existing assets, did you cite the creators?

Did you mention the license of the assets?

Did you include any new assets either in the supplemental material or as a URL?

Did you discuss whether and how consent was obtained from people whose data you’re using/curating?

Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content?


If you used crowdsourcing or conducted research with human subjects…

Did you include the full text of instructions given to participants and screenshots, if applicable?

Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable?

Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation?

Appendix A Appendix
a.1 Prediction layer visualization
We can visualize the prediction layer for the 2state models. The input of the prediction layer of node is the (relaxed) 2dimensional neighborhood counting vector . The probability distribution over the two nodestates is fully specified by one probability. In Fig. 7, we visualize these probabilities given by the prediction layer of all possible neighborhood counting vectors (for nodes with degree ). The results are given for a Watts–Strogatz graph and the same node for all three models. We observe how the prediction layer captures a roughly linear boundary for the SIS and the Inverted Voter model. The majority flip dynamic leads to a more intricate predictive structure.
We see that the network successfully learns to represent complicated conditional probability distributions of nodestates.
a.2 Dynamical models
Except for the CML, we use continuous time stochastic processes with a discrete statespace to generate snapshots. Specifically, these models have Markov jump process or continuoustime Markov chain (CTMC) semantics. Moreover, each node/agent occupies one of several nodestates (denoted
) at each point in time. Nodes change their state stochastically according to their neighborhood (more precisely, to their neighborhood counting vector). We assume that all nodes obey the same rules/local behavior. We refer the reader to [25, 8, 15] for a detailed description of the CTMC construction in the context of epidemic spreading processes.Sis
Nodes are susceptible (S) or infected (I). Infected nodes can infect their susceptible neighbors or spontaneously become susceptible again. In other words, Inodes become Snodes with a fixed reaction rate and Snodes become Inods with a rate , where denotes the number of infected neighbors of the node and is the reaction rate parameter. Moreover, for all models, we add a small amount of stochastic noise to the dynamics. The noise not only emulates measurement errors but also prevents the system from getting stuck in trap state where no rule is applicable (e.g., all nodes are susceptible).
In the sequel, we use the corresponding notation
The reaction rate refers to the exponentially distributed residence times and a higher rate is associated with a faster state transition. When the reaction rate is zero (e.g., when no neighbors are infected), the state transition is impossible.
The parameterization is , , and .
Inverted Voter
The Inverted Voter models two competing opinions (A and B) while nodes always tend to maximize their disagreement with their neighbors.
We use .
MajorityFlip
MajorityFlip models two competing opinions while nodes tend to flip (i.e., change) their state when the majority of their neighbors follow are in the same state. A light asymmetry makes the problem solvable.
The indicator variables identifies what counts as a majority. They are defined as
If the indicator is not one, it is zero. We use .
Rock Paper Scissors
provides a simple evolutionary dynamics where three species overtake each other in a ringlike relationship.
We use .
Forest Fire
Spots/nodes are either empty (E), on fire (F), or have a tree on them (F). Trees grow with a growth rate . Random lightning starts a fired on treenodes with rate . The fire on a node goes extinct with rate leaving the node empty. Finally, fire spreads to neighboring treenodes with rate .
The parameterization is , , , , and .
Coupled Map Lattice
Let be the value of node at timestep . Each node starts with a random value (uniform in ). At each time step all nodes are updated based on a linear combination of the nodevalue and the nodevalues of neighboring nodes [22]:
where is the degree of , denotes the set of (indices of) nodes adjacent to , is the coupling strength and is the local map. Like [41], we use the logistic function [27]:
where modulates the complexity of the dynamics.
We use and .
a.3 Random interaction graphs
We use the Python NetworkX [18] package to generate a single instance (variate) of a random graph model and test the GINA and the baselines and a large number of snapshots generated using this graph. In particular, we use ErdősRenyi (ER) (, ) graph model with connection probability :
nx.erdos_renyi_graph(25, 0.15, seed=43)
(the 43 seed was used in order to create a connected graph). We also use Geometric Graph (, ):
nx.random_geometric_graph(200, 0.125, seed=42)
and a Watts–Strogatz graph (, ) where each node has 4 neighbors and the rewiring probability is :
nx.newman_watts_strogatz_graph(50, 4, 0.15, seed=42)
After generation, the nodeids are randomly shuffled in order to guarantee that they do not leak information about connectivity to the training model.