Network data is commonly observed in a variety of real-world applications ranging from social networks observing interactions between pairs of individuals to biological networks such as protein-protein interactions. This has led to a growing interest on probabilistic models for network data which not only offer a generative mechanism capturing empirically observed network effects, but are also easily estimable using existing statistical approaches. We study the setting where multiple distinct networks on the same set of nodes are observed. These may correspond to a collection of networks over an ordered set such as time (e.g., dynamic networks), or an unordered set such as networks from different subjects observed at a fixed point in time (cross-sectional networks). Given such datasets it is natural to ask: how does the structure in networks change or evolve within the collection?
A nonparametric approach to modeling undirected network data is achieved by the graphon model [2, 18, 10, 11, 33], estimation of which has received a lot of attention (e.g. [55, 38, 51, 13, 48, 40, 30, 31, 39]). However, this has mostly focused on estimation in the setting where only a single network is observed. Nonparametric modeling and estimation for multiple networks, in general assumed to be non-identically distributed, has largely been ignored. In many applications observing multiple networks, estimation under the general assumption of non-identically distributed networks seems natural. For example, consider a network of individuals with edges determined by similarity in political views, observed at multiple time points. Then in addition to a baseline model where political views are determined by a signature specific to each individual, a second source of variability arises from change in opinions over time as new information becomes available. Without incorporating the second source of time-specific variability, we would average out important features which possibly characterise and differentiate interaction behavior at different time points. With this view, we propose a natural extension of the standard graphon model to incorporate network heterogeneity in addition to nodal heterogeneity via a multi-graphon function. Further, we show how information from multiple networks on the same set of nodes can be leveraged to enable estimation via standard nonparametric regression techniques for both replicated (i.i.d networks) and heterogeneous (independent but non-identically distributed) collection of networks.
The data consists of a collection of distinct undirected networks without self-loops, on the same set of nodes, represented using adjacency matrices . These networks may be binary with each , and indicating the presence of an edge between nodes and in the th network; or weighted with recording the count of interactions between nodes and in the th network. Given a single undirected binary network , it is standard to assume that for , are independent Bernoulli trials, where
are edge probabilities determined by an underlying two dimensional function, known as the graphon, e.g. . For a non-identically distributed collection of networks, we consider a natural extension of this model where are independent Bernoulli trials, with denoting edge-probability for node pair in the th network. We achieve this via a three-dimensional bounded measurable function , we called multi-graphon, where the third dimension allows for network-specific effects via network positions and thus different interaction probabilities in different networks. Further, the multi-graphon function by design is such that averaging over network-specific effects brings us back to the standard graphon model for replicated networks i.e., as independent Bernoulli trials where is now determined by the flattened multi-graphon where . Change in interactions over distinct networks may arise as a result of a series of small changes occurring between consecutively observed networks or as a result of ‘jumps’ (for example with as a stepfunction in ). In this paper, we focus on estimation of the multi-graphon array for heterogeneous networks assumed to be generated from smooth kernels and hence of a ‘slowly-varying’ type.
A key challenge in graphon estimation using standard nonparametric regression is the latency of nodal positions corresponding to the observed response of pairwise interactions. This has led to a variety of contributions focusing on histogram approximations to the graphon function, and more specifically, graphon matrix estimation (e.g. [56, 39, 1, 15, 55, 38, 51, 13, 48, 40, 31]). One of the main objectives in these methods is suitable identification of neighborhoods, either combinatorially (e.g. [38, 51]); through assumptions like strict monotonicity of the degree sequence ; or through a construction of distances between node pairs ([1, 56, 39]), each approach allowing a locally-averaged estimator. The method of  is particularly attractive as it allows neighbors to vary from node to node resulting in a local moving average estimator. Given the adaptive neighborhood choice, it is closer to a Nadaraya-Watson type estimator with uniform weighting in each neighborhood, than a standard histogram with fixed neighborhoods. While this offers a significant improvement over local-constant or histogram estimators, in general, it lacks the flexibility and advantages offered by the vast literature on standard nonparametric methods [50, 20, 45] (with different smoothing techniques : local vs global, automatic smoothing parameter selection, direct implementation, to name a few). Further, with the exception of , these methods are designed for graphon estimation from a single network. The method of  provides a blockmodel approximation to graphon function using multiple i.i.d. networks and thus corresponds to the special case of replicated networks.
We propose a two step multi-graphon estimator where the first step uses the similarity of interactions between node pairs to construct an embedding of nodes in the Euclidean space, and the second step achieves estimation via nonparametric regression using estimated nodal positions from the first step as design points. Intuitively, embedding of nodes in the first step is based on the idea that for a smooth collection of networks, nodes ‘closer’ to each other, must connect ‘similarly’. This leads to a concept of distances between pairs of nodes, first studied in  to cluster nodes into a fixed number of blocks, leading to a histogram approximation to graphon. Similar distance-based approaches have subsequently been used for adaptive neighborhood selection , and more recently by  to allow estimation via fused lasso. Unlike these existing approaches, we study the use of pairwise distance comparisons of the form to identify nodal positions via the ordinal embedding approach of . Using classical Fréchet bounds, we show that our pairwise nodal distance estimates concentrate jointly at exponential rates. Further using results related to the “broken stick” theorem, we prove that our maximal error in latent position estimate is . Leveraging this result, we find that our proposed method achieves, in a range of data sampling regimes — this in terms of number of network observations, number of nodes they contain, and average network density — the optimal convergence rate of an oracle estimator that observes the true latent positions.
In the special case of replicated networks arising from a common distribution, we are concerned with estimation of the standard two-dimensional graphon model and hence nonparametric regression is achieved easily using the estimated nodal positions. In the case of heterogeneous networks observed over time, it is assumed that network positions correspond to equi-spaced time points i.e., , where , and our model reduces to the dynamic graphon model of . For heterogeneous cross-sectional networks, estimation of multi-graphon relies on the availability of network-level covariates which are modeled as noisy measurements of unobserved network positions. Intuitively, this is motivated from the empirical observation that networks with similar traits (such as age or creativity scores of subjects in brain networks) often interact in ways similar to each other [5, 49], and following related work such as  modeling dependence between node covariates and unobserved node positions; covariates to explain link homophily .
Finite sample performance studied via Monte Carlo simulations demonstrate that our method is comparable to existing methods for the case of replicated networks but performs significantly better when heterogeneous collection of networks are observed. Useful insights on the performance of our two-step approach are offered by comparisons with oracle versions of our estimator obtained using knowledge of the true node and network positions. This offers a benchmark for comparison under a fixed choice of smoothing technique. Further, we find that even with moderately informative network-level covariates (signal-to-noise ratio of one), the proposed estimator leads to significant improvements over existing methods in most cases.
We illustrate the usefulness of our approach using two real-world data sets: a contact network of ants observed over a period of days , and human connectome networks from multiple subjects [42, 29]. Our results reveal interesting insights on the division of labor among ant workers over time and on the link between brain region interactions and creativity levels. The multi-graphon model leads to newer insights which are lost when estimation is performed under the simplified assumption of replicated networks. Our multi-graphon estimates for the dynamic ant contact network suggest that changes in intensity of interaction between ant workers over time is possibly linked to changes in occupation of ant workers as they age (e.g., with younger nurse ants becoming cleaners over time). Multi-graphon estimates for the connectome networks revealed that intensities of interactions between certain brain region pairs may significantly increase and subsequently decrease (or vice versa) with increase in creativity scores, suggesting that high level analyses achieved via clustering of brain networks into low and high creativity groups (e.g. ), must be fine tuned to achieve a more accurate account of changes in brain region interactions with increase in creativity levels. An application of the estimated multi-graphon model to resampling brain networks shows that our estimated model captures the well-known small-world behavior of high creativity brains.
2 Model Elicitation
A probabilistic generative mechanism for a collection of heterogenous undirected networks, each on nodes, represented via adjacencies , where each is elicited via a multi-graphon defined below.
Definition 1 (Multi-graphon).
We call multi-graphon a function , such that for any given , is a graphon in the conventional sense, i.e., integrable and .
Definition 2 (Generalized random graph model ).
Let be a random vector sampled from a distribution
be a random vector sampled from a distributionsupported on . Further, let denote a random vector sampled from a distribution supported on . Given a multi-graphon , conditional on the sampled positions , we model for all , , as independent Bernoulli trials with
For identifiability of , it is assumed that for any . Then, clearly, for a binary network , and may be estimated as the average proportion of non-zero edges in each network, i.e.,
A significant proportion of the literature on dynamic (or multi-graph) network models are extensions of single network models augmented with a Markovian assumption to describe network evolution over time [32, 34]. Other related work includes latent space approaches modeling node and network dynamics through a single latent variable [22, 44, 53, 43]. On the other hand, our model assumes a common latent nodal space and a separate network-specific latent variable which allows varying interaction probabilities across network samples for any given pair of nodes. This feature allows a simple but flexible approach to capturing network-specific effects in a collection of slowly-varying networks, and has been studied in the context of multi-graph SBM [25, 27] and more recently .
Following the Bernoulli model for given above, weighted edges between nodes and in the
th network are conditionally independent Binomial random variables with success probabilities.
Definition 3 (Flattened ).
For a multi-graphon , the flattened multi-graphon denoted as , is such that .
3 Latent position estimation via embedding
The main goal of this section is to show how latent nodal positions can be inferred consistently using a pairwise distance measure together with the ordinal embedding approach of . We begin with the construction of a distance between pairs of nodes under the generalized random graph model with a smooth multi-graphon . Subsequently, in Proposition 1 we show that this distance can be estimated consistently from adjacencies . Further, we note that this distance, a semi-metric, corresponds to a metric on the purified graphon space. An important consequence of this fact is that nodal positions (or neighborhoods, e.g. ) obtained via this distance correspond to positions of nodes in the purified graphon space.
3.1 Distance between node pairs
The concept of a distance between nodes of a network follows naturally for smooth multi-graphons: for node pairs closer to each other i.e., if is close to , then for most and , and should also be close (e.g. ). With this idea, the distance between multi-graphon planes at and may be used to quantify distance between nodes and as
However, as we want to focus on the distance between vertices, which under the generalized random graph model (see Definition 2) can be recovered through the flattened graphon , it is sufficient to consider the distance based on the flattened graphon , i.e.,
This distance can be estimated exactly using the adjacencies alone via Algorithm 1 given below, which is a generalization of the algorithm in  (see Section 3.1.1), to allow robust estimation for networks, which may not necessarily be dense.
Proposition 1 (Consistency).
For , if and , then using Algorithm 1, and asymptotically in and ,
where and .
From Proposition 1 it is evident that we must have for to be consistent. For slowly increasing, this requires ; i.e., the average degree growing at least as fast as . Put another way, it requires the number of paths of length two between any two nodes to behave like a , and we need the mean to be large enough to carry a Normal approximation. It follows that we are assuming that the total number of paths of length two between any pair of nodes across network replicates is in general larger than 20. This assumption could be unrealistic for some sparse networks . In case the assumption cannot be met we suggest the following modification to Algorithm 1: instead of counting paths of length between nodes and to define in Step 1., use paths of length , for integer ; i.e., set
Then, it is possible to first show with a direct walk counting argument that, (see e.g. )
Then, by the exact same steps as in the proof of Proposition 1, we obtain that with ,
This reduces the density requirement to, for finite, at the cost of a coarser distance. There is also a computational cost. Indeed, while both the space and computational complexity of Algorithm 1 are , the modified version above has the same space complexity, but computation are (with the complexity of the matrix product.)
3.1.2 Pure graphons
A characterization of the distance given by 3.2 follows through its association with a metric induced by . With a distribution of latent nodal positions on and a graphon, let
Then, is a semi-metric on [33, Section 13]. For example, in our case, noting that is a positive symmetric operator, and assuming to be of finite rank , we may write where the form an orthogonal basis; i.e., for all , . Then, writing , and setting
the uniform distribution on, we observe that
thereby proving that is the Euclidean distance between the images of and projected by . However, by [33, Subsection 13.3], can be transformed into a metric via purification of . Specifically, for a graphon , there exist maps and such that:
almost everywhere for i.i.d. , and
is a metric on ,
and is referred to as the purified graphon corresponding to . In [33, Section 13], arguments are presented motivating the assumption that graphons, except some pathological cases, can be purified in such a way that is of dimension one.
3.2 Node embedding
As discussed above, our goal is to obtain nodal positions satisfying distance comparisons implied by . While we could, for instance use the Gram operator, the quality of the estimate would only scale, at best, with , as seen in Proposition 1. We note that this rate can be significantly improved through ordinal embedding [46, 3]. To justify the use of ordinal embedding we must first show that our distance estimator will order the distances appropriately with high probability. This is the case in our setting, as shown below in Proposition 2. We establish consistency of our nodal position estimator (up to a similarity transformation) in Theorem 1.
For , if and , then using Algorithm 1 there exists such that for and large enough
Then, by the Fréchet inequality, asymptotically in and ,
From the characterization of distance via the purified graphon, it follows that using ordinal embedding on will yield an estimate of the the latent positions under the purified graphon (the s). Theorem 1 shows that this estimate is consistent, up to similarity transform, with an error bounded by .
Ordinal embedding with produces consistent (up to similarity transform) estimators of the latent vertex location under the purified graphon, with a maximal error of order .
4 Multi-graphon estimation
The algorithm for multi-graphon estimation based on embedding nodal positions is included below. Theorem 2 shows that the resulting multi-graphon estimator is consistent for a family of piecewise Lipschitz graphon functions. Further, our estimator achieves the optimal rate of , as if the latent positions were observed. Given an matrix , let denote vectorization of into an length column vector obtained by stacking the transposed rows of , on top of one another.
Fix a smooth multi-graphon function . Assume that we observe , noisy measurements of the true network positions , such that has finite second moments. Call the joint distribution of a pair of latent
has finite second moments. Call
the joint distribution of a pair of latent’s and . Set such that is symmetric in its first two arguments, linear in the fourth, and that and its first derivatives are finite almost everywhere. Then, if and , asymptotically in and ,
Theorem 2 shows that in the setting we consider (in effect, independent observations from a smooth multi-graphon with ), estimation of the latent nodal positions comes at negligible accuracy cost. Indeed, the rate of convergence we obtain is , which is the same rate as the optimal rate we could obtain if the latent position were observed . This naturally raises the question of what concretely this regime encompasses, and its limits.
The first case to consider is when remains small, which corresponds most closely to the setting where only a single adjacency matrix is observed. Then, our assumption translates into an assumption on the density of the network — specifically — which will be unrealistic in some settings; e.g., social network observations tend to be much sparser in practice, with in the range of to . However, other applications, such as connectome networks could accommodate such a density regime . This point puts into perspective Section 3.1.1, which allows to relax the assumption for Theorem 2 in this setting to for any , at a computational and bias cost.
Next, consider the case where network density is in the range of to , as has been observed in many settings . Then our assumption translates to , which is demanding, especially when is large. Here we note that while is indeed demanding, it is not unreasonable in the sense that Theorem 2 provides local graph statistics, specifically point-wise estimate of all edges probabilities, and it could easily incorporate node specific covariates. If the goal of estimation was instead to evaluate global estimates, say averaged across nodes or edges such as motif counts , then the assumption could be relaxed.
Based on the results and remarks included above, we provide the recommended estimation approach when lie outside the regime of Theorem 2:
If , then one should perform
regressions, one for each pair of vertices, where the response variable are the observed edges between the selected vertices. Thus, for each fixed node pair, as the length response vector corresponding to . The achieved rate will match ours in that regime, namely , but will be much lighter computationally, and fully parallelizable. Intuitively, the idea is to borrow information from ‘neighboring’ networks (in time or with similar traits) rather than neighboring nodes due to being much larger than
If , then one should estimate a graphon for each observed network separately (using an existing approach for single networks, e.g. [38, 56]), and subsequently perform local regressions, one for each pair of vertices with the estimated edge intensities as response i.e., and network level covariates as regressors.
This follows from Section 3.1.1, and the achieved rate will depend on the smoothness of the multi-graphon, but the said rate will be affected by the sparsity ; e.g., a graphon estimate with blocks (a standard choice for number of blocks ), will converge at most at rate , much slower than the rate under the regime of Theorem 2. Note that Section 3.1.1 allows for Theorem 2 to apply to cases where for some .
Therefore, we conclude that Theorem 2 yields optimal rates for local graph statistics in the regimes it applies to.
In the special case of replicated or i.i.d networks, we are concerned with estimation of a common network generating process or the standard two-dimensional graphon . Using the aggregated adjacency and the estimated nodal positions as above, we arrive at a special case of Theorem 2 given by Theorem 3 in Appendix A, which shows that local regression with as the length response vector with estimated nodal positions as the regressors, leads to a graphon estimator which enjoys the same properties as the multi-graphon estimator.
Further, our algorithm for multi-graphon estimation with kernel regression using a uniform kernel in Step 3. may be viewed as an extension to the neighborhood smoothing approach of  (designed for single networks) to the setting of multiple networks, with neighborhood identification based on ordinal embedding. In general, our approach has the key advantage of enabling standard nonparametric regression techniques due to the availability of nodal position estimates.
5 Finite sample performance
We conducted simulations to study finite sample performance of the proposed two-step multi-graphon estimator for a synthetic collection of networks, each on nodes, generated using functions with different degrees of smoothness and in general, with network-specific variability. Consider the following three multi-graphon functions:
, where and , (number of blocks), and , ,
where in each example, setting allows for heterogeneity across network samples through the network specific positions , whereas implies a replicated network sample where , for each arises from a common distribution specified by . Given , heterogeneous networks were generated using , respectively. Intuitively, our choice of ’s is such that it prevents the extremely smooth product kernel () from approaching a constant with increasing , and on the other hand, allows the discrete-blockmodel () to gain some smoothness across blocks with increasing . The structures implied by multi-graphons with increasing network positions, precisely, (a) , (b) and (c) , are displayed in Fig. 1.
The first multi-graphon determines links between pairs of nodes based on the product of node-specific factors and with additive network-specific effects via , implying a smooth surface. The smooth structure of
appears ideal for nonparametric regression, however, this may also lead to a high variance in nodal position estimates due to similar distances between subsets of nodes. This example is designed to understand the trade-off between these two aspects. The second graphonhas a Robinsonian form (e.g., Hubert et al. (1998)) with a peak on the diagonal and decreasing intensity as one moves away from the diagonal on either side. The third graphon is a simple stochastic blockmodel with blocks in the case of replicated networks i.e., . Clearly, the probability of interaction between nodes across blocks is determined via with the network-specific factor interacting with node-specific positions . Thus, across-block probabilities increase non-uniformly across nodes, whereas, within-block probabilities determined via (no interaction term) decrease uniformly across all nodes within the two blocks.
Given , a generalized random graph sample comprising adjacencies is simulated via independent Bernoulli trials following Definition 2. We use uniformly distributed latent nodal and network positions i.e., , and . Further, network-level covariates are sampled as noisy measurements of the corresponding unobserved network-specific positions , i.e.,
Clearly, the quality of network-specific covariates as measurements of the unobserved latent positions is a function of the noise variance . Since in our simulation set-up, we chose implying a signal-to-noise ratio (SNR) of . An SNR of unity implies that the ‘signal’ (covariate) is only as strong as noise and thus allows us to examine the performance and robustness of our method in settings where the observed covariates may not be ideal measurements of the true latent network-specific positions.
We compare the performance of our two step multi-graphon estimator with competing methods of SBA , SAS , USVT  and NBS . The algorithm of SBA achieves graphon estimation from a sample of multiple i.i.d. networks and hence corresponds to our case of replicated networks (). In order to compare with SAS, USVT and NBS, designed to work with a single adjacency matrix, we report results obtained using the aggregated adjacency . As far as we are aware, no competing methods exist for nonparametric estimation of the heterogeneous network generating process given a collection of independent, non-identically distributed networks. Noting this, we report comparisons of estimates obtained with our approach under oracle settings described below.
The simulations are conducted with a view to understand the performance of our approach for a given choice of nonparametric regression in Step 3. of Algorithm 2. This may not always lead to the smallest possible MSE using our method but shall give us a view of the general finite sample performance. We report results obtained with orthogonal series estimation with thin plate regression splines as basis functions . This was implemented in R using bam in package gam. In general, our approach can be easily implemented in R using other smoothing techniques e.g., the Nadaraya Watson estimator which may be implemented using kernreg in package gplm.
5.1 Replicated networks ()
A comparison of our approach with existing methods based on MSE averaged over replications are reported in Table 1; visual comparisons from a single run are displayed in Fig. 2. To interpret performance of our two-step estimator, we consider an oracle setting where the oracle informs order-statistics of the true latent node-specific positions (rather than the exact nodal positions). This information is used to directly construct oracle nodal position estimates denoted as [38, 17], using which nonparametric regression is performed following Step 3. of Algorithm 2. We refer to this as the oracle graphon estimator. Note that our oracle set-up does not assume the nodal positions to be known and is designed to be closer to the actual set-up involving unobserved design points.
First, comparing MSEs for estimates from the proposed method under the non-oracle setting (‘Proposed’) with the oracle setting (‘Proposed’), we note significant differences between the two for , and negligible difference for across all sample sizes . This indicates that the first step of latent position estimation performs poorly for and extremely well for . This is what we expect due to the smooth structure of leading to subsets of nodes with similar distances and hence resulting in latent position estimates with high variance. The discrete structure of on the other hand, allows clearer separation between node pairs corresponding to the two blocks due to significantly different distances, implying robust latent position estimates (as far as blockmodel estimation is concerned). Similarly, comparing oracle and non-oracle MSEs for indicate that latent position estimation works reasonably well for these networks.
In comparison to existing approaches, our actual proposed estimator (non-oracle) leads to the smallest MSE for in all cases except when . A significant reduction in the MSE of is observed as is increased from to , suggesting that nodes are insufficient to perform reliable estimation for For , our approach consistently leads to the smallest MSE with NBS leading to the second best performance. The relatively higher variance of estimates from our approach is due to high variance in nodal position estimation across replications. As discussed earlier, this is due to the smooth structure of (interestingly, heterogeneity across networks reduces the variance in nodal position estimates significantly for and : see results reported in Section 5.2). In practice, we recommend re-running the first ordinal embedding step a few times and subsequently selecting the nodal embedding with the lowest stress , as this resulted in a reduced overall variance of estimates from the proposed method. For , SBA, USVT and NBS lead to the best results with SAS leading to the highest MSE. The relatively higher MSEs from our approach for is due to the choice of nonparametric regression, precisely splines as basis functions which are clearly not ideal for estimation of a discrete blockmodel. This is evident from MSEs under the oracle setting, which are also high and comparable to MSEs under the actual non-oracle setting.
We observe that MSEs decrease with increase in for fixed in all cases, however, this is not necessarily the case with increase in and fixed, for and . This appears to be an artefact of estimation being performed with a different number of adjacencies (precisely ) aggregated in each case, generated from functions with high degree of smoothness ( and ).
5.2 Heterogeneous networks ()
We report simulation results for the general setting of cross-sectional networks observed with network-level covariates . Two oracle settings are considered: (i) oracle informing order statistics of the true node-specific positions , i.e., such that and the true network-specific positions , and (ii) oracle which again informs order statistics of the true node-specific positions exactly as oracle , however, gives no information on the network specific positions. Under both oracles provide oracle estimates of nodal positions, and our algorithm for multi-graphon estimation reduces to nonparametric regression using , and with the exact network positions under oracle , whereas with network-level covariate measurements under oracle . Thus, oracle indicates the best case performance which could be achieved for finite samples if the true set of neighboring nodes were observed, however with imperfect nodal locations . Oracle indicates the increase in error (over oracle ) resulting from the use of network-level covariates instead of the true network positions .
A comparison of our multi-graphon estimates with existing methods using MSE averaged over replications is displayed in Table 2; visual comparisons of estimates from the proposed method, SBA and NBS are displayed in Fig. 3. Unlike for and , the estimated nodal positions implied the same structure as of the true suggesting that the purified flattened graphon in these cases is identical to the actual flattened graphon . Intuitively, this is what we expect given the smooth structure of , , and the mixed structure of . To allow comparisons for , we plot our proposed estimate of with rows and columns permuted to match the true node ordering, i.e. .
Table 2 reports MSEs of the multi-graphon array averaged separately for networks generated with weak and strong network-specific effects, precisely, and , respectively. From this table, we note the relatively higher MSEs of estimates under oracle in comparison to oracle . This increase in MSE results from the use of covariates employed as noisy measurements for unobserved network positions, as expected. Further, comparing MSEs of oracle estimates with actual non-oracle estimates across and , it is apparent that suffers the most due to relatively poor estimation of latent nodal positions. As discussed earlier, this is due to it’s extremely smooth structure. Further, we see that our method leads to notably lower MSE for in all cases except when . Due the smooth structure of , nodes prove insufficient for nodal position estimation resulting in a higher MSE. For , our method consistently leads to the smallest MSE, with SBA and NBS leading to the second best performance. For , our proposed estimator is comparable to the best performing approaches of USVT, NBS and SBA for networks with stronger network-specific effects (higher values of ) but has a relatively higher MSE otherwise. This is due to the fact that is simply a discrete block model for smaller values of and thus estimation with splines as basis functions even with the true nodal locations does not lead to improved estimation. This is apparent from the MSEs corresponding to the oracle settings of the proposed method which also have higher MSEs for smaller values of . Noting the good performance of NBS for , we recommend using our approach with kernel regression (e.g. with a uniform kernel) rather than splines, for multi-graphon estimation of networks with discrete structure.