1 Introduction
The problem of how to obtain predictions for unknown entries of a matrix, given restrictions on the row and column sums is a problem that comes with many labels. Without a sharp distinction of names and fields, some non exhaustive examples for keywords that are related to very similar settings are Network Tomography and Traffic Matrix Estimation
in Computer Sciences and Machine Learning (e.g.
Vardi, 1996, Coates et al., 2002, Hazelton, 2010, Airoldi and Blocker, 2013, Zhou et al., 2016), InputOutput Analysis in Economics (e.g. Bacharach, 1965, Miller and Blair, 2009), Network Reconstruction in Finance and Physics (e.g. Sheldon and Maurer, 1998, Squartini and Garlaschelli, 2011, Mastrandrea et al., 2014, Gandy and Veraart, 2018), Ecological Inference in Political Sciences (e.g. King, 2013, Klima et al., 2016), Matrix Balancing in Operation Research (e.g. Schneider and Zenios, 1990) and many more.An old but nevertheless popular solution to problems of this kind is the so called iterative proportional fitting procedure (IPFP), firstly introduced by Deming and Stephan (1940) as a mean to obtain consistency between sampled data and populationlevel information. In essence, this simple procedure iteratively adjusts the estimated entries until the row and column sums of the estimates match the desired ones. In the statistics literature, this procedure is frequently used as a tool to obtain maximum likelihood estimates for loglinear models in problems involving threeway and higher order tables (Fienberg et al., 1970, Bishop et al., 1975, Haberman, 1978, 1979). Somewhat parallel, the empirical economics literature, concerned with the estimation of InputOutput matrices, proposed a very similar approach (Bacharach, 1970), often called RAS algorithm. Here, the entries of the matrix must be consistent with the inputs and the outputs. The solution to the problem builds on the existence of a prior matrix that is iteratively transformed to a final matrix that is similar to the initial one but matches the inputoutput requirements. Although the intention is somewhat different, the algorithm is effectively identical to IPFP (Onuki, 2013). The popularity of the procedure can also be explained by the fact that it provides a solution for the so called maximum entropy (ME) problem (Malvestuto, 1989, Upper, 2011, Elsinger et al., 2013). In Computer Sciences, flows within router networks are often estimated using Raking and so called Gravity Models (see Zhang et al., 2003). Raking is in fact identical to IPFP and the latter can be interpreted as a special case of the former.
In this paper, we propose an estimation algorithm that builds on augmented Lagrangian optimization (Powell, 1969, Hestenes, 1969) and can provide the same predictions as IPFP but is flexible enough to be extended toward more general concepts. In particular we propose to include exogenous covariates and random effects to improve the predictions of the missing matrix entries. Furthermore, we compare our approach with competing models using real data. To do so, we look at an international financial network of claims and liabilities where we pretend that the inner part of the matrix is unknown. Since in the data at hand the full matrix is in fact available we can carry out a competitive comparison with alternative routines. Note that commonly the inner part of the financial network remains unknown but finding good estimates for the matrix entries is essential for central banks and financial regulators. This is because it is a necessary prerequisite for evaluating systemic risk within the international banking system. See e.g. a very recent study by researchers from 15 different central banks (Anand et al., 2018) where the question of how to estimate financial network linkages was identified as being crucial for contagion models and stress tests. Our proposal has therefore a direct practical contribution.
The paper is structured as follows. In Section 2 we relate maximum entropy, maximum likelihood and IPFP. In Section 3 we introduce our model and discuss estimation and inference as well as potential extensions. After a short data description in Section 4 we apply the approach, compare different models and give a small simulation study that shows properties of the estimator. Section 5 concludes our paper.
2 Modelling approach
2.1 Notation
Our interest is in predicting nonnegative, directed dyadic variables among observational units at time points . The restriction to nonnegative entries is henceforth referred to as nonnegativity constraint. We do not allow for selfloops and leave elements undefined. Hence, the number of unknown variables at each time point is given by . Let be an
dimensional column vector and define
as the corresponding ordered index set. We denote the th row sum by and the th column sum by . Stacking the row and column sums, results in the dimensional column vector . Furthermore, we define the known binary routing matrix such that the linear relation(1) 
holds. Henceforth, we will refer to relation (1) as marginal restrictions. Furthermore, we denote each row of by the row vector . Hence, we can represent the marginal restrictions row wise by
Note that in cases where some elements of are zero, the number of unknown variables to predict decreases and matrix
must be rearranged accordingly. In the following we will ignore this issue and suppress the timesuperscript for ease of notation. Random variables and vectors are indicated by upper case letters, realizations as lower case letters.
2.2 Maximum entropy, iterative proportional fitting and maximum likelihood
Besides the long known relation between maximum entropy (ME) and IPFP, there also exists an intimate relation between maximum entropy and maximum likelihood that is formalized for example by Golan and Judge (1996) and is known as the Duality theorem, see for example Brown (1986) and Dudík et al. (2007). Also in socalled configuration models (Squartini and Garlaschelli, 2011, Mastrandrea et al., 2014) the connection between maximum entropy and maximum likelihood is a central ingredient for network reconstruction.
In the following (i) we rely on the work of Golan and Judge (1996), Squartini and Garlaschelli (2011) and MuñozCobo et al. (2017)
in order to briefly derive the MEdistribution in the given setting. (ii) After that, we show that IPFP indeed maximizes the MEdistribution. (iii) Based on the first two results, we show that we can arrive at the same result as IPFP by constrained maximization of a likelihood where each matrix entry comes from an exponential distribution.
(i) Maximum entropy distribution: We formalize the problem by defining the Shannon entropy functional of the system as
where we make it explicit in the notation that the functional takes the function as input. The support of is given by , ensuring the nonnegativity constraint. Furthermore, we require that the density function integrates to unity
(2) 
We denote the expectation of the random vector by and formulate the marginal restrictions in terms of linear restrictions on which we specify as
(3) 
Combining the constraints (2) and (3) results into the Lagrangian functional
(4) 
with Lagrange multipliers for . The solution can be found using the EulerLagrange equation (Dym and Shames, 2013), stating that a functional of the form is stationary (i.e. its first order derivative is zero) if
(5) 
If the Lagrangian functional does not depend on the derivative of , we find the right hand side in equation (5) to be zero so that no derivative appears. For the Lagrangian functional (4) this provides
(6) 
Rearranging the terms in (6) results in the maximum entropy distribution
(7) 
In order to ensure restriction (2) we set where is the parameter vector and
where for ensures integration to a finite value. Taken together, this leads to the exponential family distribution
(8) 
Apparently, the sufficient statistics in (8) result through
and hence, we can characterize the dimensional random variable in terms of parameters . Using (8), the second order condition results from
and ensures that is indeed a maximizer.
(ii) IPFP and the maximum entropy distribution: In order to solve for the parameters of the maximum entropy distribution we take the first derivative of the loglikelihood obtained from (8), i.e.
(9) 
Since (8) is an exponential family distribution we can use the relation
and the maximum likelihood estimator results from the score equations
(10) 
If we now multiply the left and the right hand side of (10) by parameter we get
and can solve the problem using fixedpoint iteration (Dahmen and Reusken, 2006). That is we fix the right hand side to and update the left side to through
(11) 
But this is in fact iterative proportional fitting, a procedure that iteratively rescales the parameters until the estimates match the marginal constraints. Convergence is achieved when , satisfying the score equations (10). More generally, the loglikelihood (9) is monotonically nondecreasing in each update step (11) and convergence of (11) is achieved only if the loglikelihood is maximized (Koller et al., 2009, Theorem 20.5).
(iii) IPFP and constrained maximum likelihood: If we resort the sufficient statistics and relabel the elements of we get
with for . This leads to
(12) 
and where with (10) . Hence, we can represent the whole system as the product of densities from exponentially distributed random variables for . That is
(13) 
with observed margins and .
3 Maximum likelihoodbased estimation strategy
3.1 Parametrization and estimation
From result (13) it follows that we can use a distributional framework in order to build a generalized regression model. We exemplify this with a model which includes a sendereffect denoted as and a receivereffect . We stack the coefficients in a parameter vector such that the following loglinear expectation results
(14) 
Based on this structural assumption, we can now maximize the likelihood derived from (13) with respect to subject to the observed values and the moment condition
. In the given formulation, the moment condition is linear in
but not in . Consequently, the numerical solution to the problem might be burdensome. We therefore propose to use an iterative procedure that is somewhat similar to the Expectation Conditional Maximization (ECM, Meng and Rubin, 1993) algorithm, since it involves iteratively forming the expectation of based on the previous parameter estimate (EStep) and constrained maximization afterwards (MStep). To be specific, define the design matrix , that contains indices for sender and receivereffects. Matrix has rows indexed by where for we have the th and theth element of equal to and all other elements are equal to zero. Starting with an initial estimate that satisfies , we form the expectation of the loglikelihoodThen, the maximization problem in the Mstep is given by
(15) 
A suitable optimizer for nonlinear constraints is available by the augmented Lagrangian (Hestenes, 1969, Powell, 1969)
(16) 
with and being auxiliary parameters. The augmented Lagrangian method decomposes the constrained problem (15) into iteratively solving unconstrained problems. In each iteration we start with an initial parameter in order to find the preliminary solution . Then, we update in order to increase the accuracy of the estimate. In case of slow convergence, also can be increased. An implementation in R is given by the package nloptr by Johnson (2014).
3.2 Confidence and prediction intervals
Considering the data entries as exponentially distributed allows for a quantification of the uncertainty of the estimates. We pursue this by bootstrapping (Efron and Tibshirani, 1994) here. Given a converged estimator we can draw for each matrix entry from an exponential distribution with expectation in order to obtain bootstrap samples . For each bootstrap sample we calculate the marginals and rerun the constrained estimation procedure resulting in vectors of estimated means . Consequently, the moment condition holds for all mean estimates of the bootstrap and by modelconstruction, the expected marginal restrictions from the bootstrap sample match the observed ones:
Based on the bootstrap estimates, we can easily derive confidence intervals for
using the variability of for . Additionally, we define the prediction error as and construct prediction intervals for the unknownbased on the quantiles of the empirical distribution of
3.3 Extensions with exogenous information and random effects
The regression framework allows to extend the model by including exogenous information. We consider again model (13) and parametrize the expectation through
(17) 
with and again being the subjectspecific sender and receivereffects. Furthermore, represents a dimensional covariate vector and is the corresponding parameter vector. We can use the augmented Lagrangian approach from above to estimate the dimensional parameter vector . It is important to note here that only dyadic covariates have the potential to increase the predictive performance of the approach. If we only include subjectspecific (monadic) information the expectation can be multiplicatively decomposed and the model collapses back to the IPFP model (14). Thus is easily seen through
Nevertheless, the inclusion of subjectspecific information may be valuable if it is the goal to forecast future networks based on new covariate information. This holds in particular in dynamic networks. We give an example for predictions based on lagged covariates in the next section.
We can also easily add additional structure to model (17
) and assume a distributional form for some or all coefficients. A simple extension arises if we assume random effects. This occurs by the inclusion of normally distributed sender and receivereffects:
(18) 
where we take as the vector of parameters that determines the covariance matrix of the random effects. The latter could be parametrized for example with such that
(19) 
where we assume separate variance components for the sender and the receiver effects, respectively. In order to fit the model, we follow a Laplace approximation estimation strategy similar to
Breslow and Clayton (1993). Details are given in the Appendix A.4 Application
4.1 Data description
AT  Austria  ES  Spain  JP  Japan 
AU  Australia  FI  Finland  KR  South Korea 
BE  Belgium  FR  France  NL  Netherlands 
CA  Canada  GB  United Kingdom  SE  Sweden 
CH  Switzerland  GR  Greece  TR  Turkey 
CL  Chile  IE  Ireland  TW  Taiwan 
DE  Germany  IT  Italy  US  United States of America 
The dataset under study is provided by the Bank for International Settlements (BIS) and freely available from their homepage. In general, the locational banking statistics (LBS) provide information about international banking activity by aggregating the financial activities (in million USD) to the country level. Within each country, the LBS accounts for outstanding claims (conceptualized as a valued dyad that consists of all claims banks from country to banks of country ) and liabilities of internationally active banks located in reporting countries (conceptualized as the reverse direction ). We have selected the 21 most important countries (see Table 1) for the time period from January 2005 to December 2017 as a quarterly series for the subsequent analysis. In Figure 1 the density of the network (number of existing edges relative to the number of possible edges) is shown on the left, the share of the zerovalued marginals in the middle and the development of the aggregated exposures on the right. Especially in the first years some marginals of the financial networks are zero and the corresponding matrix entries are therefore not included in the estimation problem. Correspondingly, it can be seen that most countries do have some claims and liabilities to other countries but especially in the beginning, many dyads are zero valued.
Since it is plausible that financial interactions are related to the economic size of a country, we consider the annual Gross Domestic Product (GDP, in current USD Billions) as covariate. The data is provided for the years 20052017 by the International Monetary Fund on their homepage. Furthermore, there might be relationship between trade in commercial goods and financial transfers and we use data on dyadic trade flows (in current USD) between states as additional covariate. The data is available annually for the years 2005 to 2014 by the Correlates of War Project online. We do not have available information on trade for the years 2015, 2016 and 2017 and we therefore extrapolate the previous values using an autoregressive regression model. Apparently, by doing so we have covariate information which is subject to uncertainty. We ignore this issue subsequently. In order to have an example for uninformative dyadic information, we use timeinvariant data on the dyadic distance in kilometres between the capital cities of the countries under study (provided by Gleditsch, 2013). Finally, in some matrix reconstruction problems, the matrix entries of previous time points become known after some time. Typically, lagged values are strongly correlated with the actual ones. We therefore also consider the matrix entries, lagged by one quarter as covariates. See Table 2 for an overview of the variables, together with the overall correlation of the actual claims and the respective covariate. In the subsequent analysis we include all covariates in logarithmic form.
Variable  Description  Type  Correlation 

Claim from country to country  dyad specific  
Gross Domestic Product of country  node specific  
Gross Domestic Product of country  node specific  
Bilateral trade flows of commercial goods  dyad specific  
Distance between the capital cities of countries and  dyad specific  
Lagged claim from country to country  dyad specific 
4.2 Model performance
Method  Covariates  Rand. eff.  Model  overall  overall  average  SE  average  SE  
1  IPFP      (11,14)  
2  Regression    (14,19)  
3  Regression  , ,    (17)  3 300.242  56.794  63.466  7.802  
4  Regression  , ,  (17, 19  9.222  1.052  
5  Regression  , ,    (17)  
6  Regression  , ,  (17, 19)  
7  Regression  , ,    (17)  2 280.235  41.805  43.851  5.591  1.549  
8  Regression  , ,  (17, 19)  11.715 
We evaluate the proposed models in terms of their and errors. The corresponding results are provided in Table 3. As a baseline specification, all models contain sender and receiver effects. In the first row, we provide the maximum entropy model (14) that coincides with the IPFP solution (11). The second row shows model (14) together with the random effects structure (19). In the third row, we provide the errors for model (17) where we included the covariates logarithmic GDP (, ) as well as the logarithmic trade data (). In row four, we use the same model as in row three but additionally added the random effects structure from (19). In rows five and six, the same models as in rows three and four are used but with logarithmic distance () instead of trade as dyadic explanatory variable. In the last two rows we consider models with lagged claims () with and without random effects. This comparison might be somewhat unfair because of the strong correlation and because it is not clear whether it can safely be assumed that such data is always available. Therefore, we have separated this specification from the other models.
In the first four columns the different specifications together with the related equations are provided. Columns five and six show the aggregated errors over all 52 quarters and the last four columns show the errors averaged over all years together with their corresponding standard errors.
It can be seen that the first two models provide the same predictions and the inclusion of the random effects has no impact other than giving estimates for the variance of the sender and receivereffects as well as their correlation, shown in Figure 2. It becomes visible that the variation of the receivereffect is much higher than the variation of the sendereffect which is almost constant. The correlation between the sender and the receiver effect is consistently positive and increases strongly within the first years.
Furthermore, Table 3 shows that in the four models that include exogenous information (rows three to six) the extension towards the random effects structure has an impact on the predictive quality. It decreases in the model that includes the variable and increases in the one that includes . Nevertheless, the models with and without random effects are rather close to each other and in fact they are statistically indistinguishable with respect to their differences. While the model with the covariate performs even worse than the IPFP solution, the model that includes but includes no random effects (row three) gives superior predictions relative to all other models in the upper part of Table 3. However, the two models that use the information on the lagged values give by far the best predictions. We nevertheless continue with the best model from the upper part of Table 3 lagged data are not necessarily available.
The corresponding fitted values are provided as time series in Figure 4 and in Figure 3 we provide the estimates for the coefficients of the model. In the first row, the estimated sender (, left) and receivereffects (, right) are shown as a time series. In the second row of Figure 3 the estimates for the coefficients on the exogenous covariates () can be seen. The estimated coefficients provide the intuitive result that the claims from country to country increase with and and the trade volume between them (). It is reassuring that the ordering of the average height of the coefficients approximately matches with the order of the correlations reported in Table 2. Note however, that the size of the coefficients is to be interpreted with care because of the limited information available on the unknown claims.
We also provide prediction intervals in Figure 5, based on the share of real values located in the interval . Here, and are the and quantiles derived from the bootstrap distribution (bootstrap sample size ). On the left, we illustrate the real values against the predicted ones together with grey 95% prediction intervals for the most recent network. Observations that do not fall within the prediction interval are indicated by red circles. Because of the quadratic meanvariance relation of the exponential distribution it is much easier to capture high values within the prediction intervals than low ones. A circumstance that materializes in the fact that exclusively small values are outside the prediction intervals. The share of real values within the prediction intervals against time is shown on the right hand side of Figure 5. We cover on average of all true values with our prediction intervals over all time periods and regard the bootstrap approach therefore as satisfying.
4.3 Comparison to alternative routines
Gravity model: A standard solution to the problem is the gravity model (e.g. Sheldon and Maurer, 1998). In essence it represents a multiplicative independence model
(20) 
The model is simple, easy to implement and very intuitive. In situations where diagonal elements are not restricted to be zero it even coincides with the maximum entropy solution.
Tomogravity model:
An extension of the gravity model is given by the tomogravity approach by Zhang
et al. (2003)
. The model was initially designed to estimate pointtopoint traffic volumes from dyadic loads and builds on minimizing the lossfunction
(21) 
subject to the nonnegativity constraint. Here, the gravity model (20) serves as a null model in the penalization term and the strength of penalization is given by . The approach is implemented in the R package tomogravity (see Blocker
et al., 2014). Zhang
et al. (2003) show in a simulation study, that is a reasonable choice if no training set is available. In our competitive comparison we optimize the tuning parameter in order to minimize the overall error with grid search and find to be an optimal value.
Nonnegative LASSO: The LASSO (Tibshirani, 1996) was already applied to predict flows in bike sharing networks by Chen
et al. (2017) and works best with sparse networks. Using this approach, we minimize the loss function
(22) 
where we can drop the absolute value in the penalization term because of the nonnegativity constraint (see Wu et al., 2014 for the nonnegative LASSO). In order to use the approach in the competitive comparison, we optimize the penalty parameter on a grid for the minimum error and use . The models are estimated using the R package glmnet by Friedman
et al. (2009).
Ecological regression:
In Ecological Inference (see e.g. Klima et al., 2016, King, 2013), it is often assumed that the observations at hand are independent realizations of a linear model, parametrized by timeconstant transitionshares . Define the stacked column sums in by and the stacked row sums in by . Then, the model can be represented as
(23) 
where the matrix contains the parameters . In the give form, the problem is not identified for one time period and it must be assumed that multiple timepoints can be interpreted as independent realizations. Additionally, the model is not symmetric, implying that the solution to equation (23) does not coincide with the solution to
(24) 
Since, the estimated transition shares are not guaranteed to be nonnegative and sum up to one they must be postprocessed to fulfil this conditions. Both models are fitted via leastsquares in R.
Hierarchical Bayesian models:
Gandy and
Veraart (2017)
propose to use simulationbased methods. In their hierarchical models the first step consists of estimating the link probabilities and given that there is a link, the weight is sampled from an exponential distribution:
(25) 
In order to estimate the link probabilities , knowledge of the density or a desired target density is needed. In their basic model it is proposed to use an ErdösRényi model with consistent with the target density. In an extension of the model, inspired by Graphon models, Gandy and Veraart (2018) propose a so called empirical fitness model. Here the link probability is determined by the logistic function
(26) 
with being some constant that is estimated for consistency with the target density. For the fitness variables , the authors propose to use an empirical Bayes approach, incorporating the information of the row and column sums as . An implementation of both models is given by the R package systemicrisk. In order to make the approach as competitive as possible we use for each quarter the real (but in principle unknown) density of the networks. Because the results of the method differ between each individual estimate, we average the estimates and evaluate the combined dataset.
4.4 Competitive comparison
Method  Model  overall  overall  average  SE  average  SE  

1  Regression (, , )  (17)  3 300.242  56.794  63.466  9.246  7.802  1.085 
2  Gravity model  (20)  
3  Tomogravity Model  (21)  
4  Nonnegative LASSO  (22)  
5  Ecological Regression, columns  (23)  
6  Ecological Regression, rows  (24)  
7  Hierarchical, ErdösRényi  (25)  
8  Hierarchical, Fitness  (26) 
Again we compare the different algorithms in terms of their and errors in Table 4. In the first row of Table 4 we show the restricted maximum likelihood model with the best predictions from Table 3 and in the following rows, the models introduced in Section 4.3 above are shown. The results can be separated roughly into three blocks. The models that fundamentally build on some kind of Least Squares criterion without referring in some way to the gravity model or the maximum entropy solution (ecological regression, and nonnegative LASSO in rows four, five and six) have the highest values in terms of their and errors. Somewhat better are the Hierarchical Bayesian Models (rows seven and eight) that can be considered as the second block. However, although they provide better predictions than the models in the first block, we used the real density of the network in order to calibrate them which gives them an unrealistic advantage. The third group is given by the gravity and tomogravity model (rows two and three). Those are statistically indistinguishable and provide considerably better results than the models from the former blocks. Nevertheless, the regression model that uses exogenous information on (first row) yields the best predictions in this comparison.
4.5 Performance of the estimator
We hope to see improvements in the predictions if we include informative exogenous variables in the model. Informative means in this context, that variation in is able to explain variation in the unknown . Apparently, including information with a low association to simply adds noise into the estimation procedure. In this case we expect inferior predictions as compared to the IPFP solution. We illustrate the properties of the estimator using a simulation study with the following data generating process
(27) 
Since the association between and the unknown is crucial, we vary the parameter from to and denote with the mean based on . For each parameter we rerun the data generating process (27) times and calculate for the th simulation the IPFP solution and the restricted maximum likelihood solution . Based on that, we calculate in each simulation the ratio of the squared errors
This ratio is smaller than one if the IPFP estimates yield a lower mean squared error than the restricted maximum likelihood estimates and higher than one if the exogenous information improves the predictive quality in the terms of the mean squared error.
In Figure 6, we show the median (solid black) and the mean (solid grey) of for different values of as well as a horizontal line indicating the value one (dashed black) and a vertical line for (dashed black). It can be seen, that the mean and the median of are below one for values of that are roughly between and but increase strongly with higher absolute values of . Apparently, the distribution of
is skewed with a long tail since the mean is much higher than the median. With very high or low values of
, the median of the relative mean squared error becomes more volatile and partly decreases.(bottom). Regressionbased model on the left, IPFP on the right. Outliers are excluded from the boxplot representation for better clarity.
Furthermore, we investigate how well the estimated expectations match the true ones. By construction, it holds that and consequently, the regressionbased approach as well as IPFP assume that the sum of realized values equals the sum of the true expectations. Nevertheless, the moment condition does not imply that . In order to investigate potential bias, we draw again from
and fix to be the true expectation and draw and reestimate again times from . Apparently, estimating the true expectations is a hard task as only sums of random variables with different expectations are available. Consequently, the variation of the mean estimates is rather high and we report boxplots of the normalized difference between the true value and mean estimate
and accordingly for . In Figure 7 we illustrate three different cases with (top), (middle) and (bottom). On the left hand side, boxplots for are shown for the regressionbased model and on the right hand side for IPFP. The solid black line represents zero and the dashed black lines give 1.96. The results for the case on the top, match with the previous analysis illustrated in Figure 6 and show that IPFP identifies the true expectations somewhat better than the regressionbased approach when the exogenous information is noninformative. In such a case, including adds noise in the estimation procedure, resulting in a greater variance around the true expectations. However, this changes strongly if is informative. Especially for on the bottom of Figure 7, some estimates obtained from IPFP are seriously biased because this procedure does not have the ability to account for the dyadspecific heterogeneity. The regressionbased method, however does a reasonable job in recovering the unknown true expectations.
5 Discussion
In this paper we propose a method that allows for network reconstruction within a regression framework. This approach makes it easy to add and interpret exogenous information. It also allows to construct bootstrap prediction intervals to quantify the uncertainty of the estimates. Furthermore, the framework is flexible enough to deal with problems that involve partial information. For example if some elements of the network are known or if we have information on the binary network, then we can model the expected values of the matrix entries conditional this information, simply by changing the routing matrix and the EStep.
However, we also want to list some shortfalls of the method. An obvious drawback of the method is its derivation from the maximum entropy principle that tries to allocate the matrix entries as even as possible and is therefore not very suitable for sparse networks as long as the sparseness cannot be inferred from the marginals. Furthermore, the estimated coefficients must be interpreted with care, as they are estimated based on a data situation with much less information than in usual regression settings. As a last but most important point, if the association between the exogenous explanatory variable(s) and the unknown matrix entries is low, the method is likely to deliver predictions that are worse than simple IPFP. It is therefore highly recommendable to use expert knowledge when selecting the exogenous dyadic covariates for regressionbased network reconstruction.
Appendix A Estimation with random effects
In order to fit a model of the form
(28) 
we follow a Laplace approximation estimation strategy similar to Breslow and Clayton (1993) and fix to some value and follow a Laplace approximation estimation strategy similar to Breslow and Clayton (1993), see also Tutz and Groll (2010). The moment condition implies a restriction for but not for and given some starting value , we can maximize the penalized loglikelihood (constant terms omitted)
subject to the moment condition . Therefore, the new optimization problem is given by
(29) 
Define as the design matrix for the fixed effects and as the () random effects design matrix, this allows to write the generic mean as . Given that we have some estimate of , call it we can estimate the variance parameters with an approximation of the marginal restricted loglikelihood:
(30) 
where , with and and consequently . The pseudoobservations are given by . Estimators can be obtained by iteratively optimizing firstly (29) and secondly (30) in each iteration until convergence.
SUPPLEMENTARY MATERIAL
 Data Sets:

BIS bilateral exposures dataset (BIS_bilateral_exposures.csv), capital distances dataset (capdist.csv), dyadic trade flows (Dyadic_COW_4.0.csv), gdp values (gdp.csv)
 Code:

A complete workflow (Workflow.zip) that allows to reproduce the whole analysis with all relevant Figures and Tables. I.e. reading in the raw data in R, estimation routines for all models mentioned in the paper and functions to create relevant plots and tables.
References
 Airoldi and Blocker (2013) Airoldi, E. M. and A. W. Blocker (2013). Estimating latent processes on a network from indirect measurements. Journal of the American Statistical Association 108(501), 149–164.
 Anand et al. (2018) Anand, K., I. van Lelyveld, Ádám Banai, S. Friedrich, R. Garratt, G. Hałaj, J. Fique, I. Hansen, S. M. Jaramillo, H. Lee, J. L. MolinaBorboa, S. Nobili, S. Rajan, D. Salakhova, T. C. Silva, L. Silvestri, and S. R. S. de Souza (2018). The missing links: A global study on uncovering financial network structures from partial data. Journal of Financial Stability 35, 107 – 119.
 Bacharach (1965) Bacharach, M. (1965). Estimating nonnegative matrices from marginal data. International Economic Review 6(3), 294–310.
 Bacharach (1970) Bacharach, M. (1970). Biproportional matrices and inputoutput change. Cambridge: Cambridge University Press.

Bishop
et al. (1975)
Bishop, Y. M., P. W. Holland, and S. E. Fienberg (1975).
Discrete multivariate analysis: theory and practice
. Cambridge: MIT Press.  Blocker et al. (2014) Blocker, A. W., P. Koullick, and E. Airoldi (2014). networkTomography: Tools for network tomography. R package version 0.3.

Breslow and
Clayton (1993)
Breslow, N. E. and D. G. Clayton (1993).
Approximate inference in generalized linear mixed models.
Journal of the American Statistical Association 88(421), 9–25.  Brown (1986) Brown, L. D. (1986). The dual to the maximum likelihood estimator. In G. S. Shanti (Ed.), Fundamentals of statistical exponential families with applications in statistical decision theory, Chapter 6, pp. 174–207. Hayward: Institute of Mathematical Statistics.
 Chen et al. (2017) Chen, L., X. Ma, G. Pan, J. Jakubowicz, et al. (2017). Understanding bike trip patterns leveraging bike sharing system open data. Frontiers of computer science 11(1), 38–48.
 Coates et al. (2002) Coates, A., A. O. Hero III, R. Nowak, and B. Yu (2002). Internet tomography. IEEE Signal processing magazine 19(3), 47–65.
 Dahmen and Reusken (2006) Dahmen, W. and A. Reusken (2006). Numerik für Ingenieure und Naturwissenschaftler. Berlin, Heidelberg: SpringerVerlag.
 Deming and Stephan (1940) Deming, W. E. and F. F. Stephan (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics 11(4), 427–444.
 Dudík et al. (2007) Dudík, M., S. J. Phillips, and R. E. Schapire (2007). Maximum entropy density estimation with generalized regularization and an application to species distribution modeling. Journal of Machine Learning Research 8(6), 1217–1260.
 Dym and Shames (2013) Dym, C. L. and I. H. Shames (2013). Introduction to the calculus of variations. In Solid Mechanics, Chapter 2, pp. 71–116. New York: Springer.
 Efron and Tibshirani (1994) Efron, B. and R. J. Tibshirani (1994). An introduction to the bootstrap. Boca Raton: CRC press.
 Elsinger et al. (2013) Elsinger, H., A. Lehar, and M. Summer (2013). Network models and systemic risk assessment. In J.P. Fouque and J. A. Langsam (Eds.), Handbook on Systemic Risk, Chapter IV, pp. 287–305. Cambridge: Cambridge University Press.

Fienberg
et al. (1970)
Fienberg, S. E. et al. (1970).
An iterative procedure for estimation in contingency tables.
The Annals of Mathematical Statistics 41(3), 907–917.  Friedman et al. (2009) Friedman, J., T. Hastie, and R. Tibshirani (2009). glmnet: Lasso and elasticnet regularized generalized linear models. R package version 2.01.6.
 Gandy and Veraart (2017) Gandy, A. and L. A. Veraart (2017). A bayesian methodology for systemic risk assessment in financial networks. Management Science 63(12), 4428–4446.
 Gandy and Veraart (2018) Gandy, A. and L. A. M. Veraart (2018). Adjustable network reconstruction with applications to CDS exposures. Journal of Multivariate Analysis, in press.
 Gleditsch (2013) Gleditsch, K. S. (2013). Distance between capital cities. http://privatewww.essex.ac.uk/~ksg/data5.html. Accessed: 20170407.
 Golan and Judge (1996) Golan, A. and G. Judge (1996). Recovering information in the case of underdetermined problems and incomplete economic data. Journal of statistical planning and inference 49(1), 127–136.
 Haberman (1978) Haberman, S. J. (1978). Analysis of qualitative data 1: Introductory topics. New York: Academic Press.
 Haberman (1979) Haberman, S. J. (1979). Analysis of qualitative data 2: New Developments. New York: Academic Press.
 Hazelton (2010) Hazelton, M. L. (2010). Statistical inference for transit system origindestination matrices. Technometrics 52(2), 221–230.
 Hestenes (1969) Hestenes, M. R. (1969). Multiplier and gradient methods. Journal of optimization theory and applications 4(5), 303–320.
 Johnson (2014) Johnson, S. G. (2014). The NLopt nonlinearoptimization package. R package version 1.2.1.
 King (2013) King, G. (2013). A solution to the ecological inference problem: Reconstructing individual behavior from aggregate data. Princeton: Princeton University Press.
 Klima et al. (2016) Klima, A., P. W. Thurner, C. Molnar, T. Schlesinger, and H. Küchenhoff (2016). Estimation of voter transitions based on ecological inference: An empirical assessment of different approaches. AStA Advances in Statistical Analysis 100(2), 133–159.
 Koller et al. (2009) Koller, D., N. Friedman, and F. Bach (2009). Probabilistic graphical models: principles and techniques. Cambridge: MIT Press.

Malvestuto (1989)
Malvestuto, F. (1989).
Computing the maximumentropy extension of given discrete probability distributions.
Computational Statistics & Data Analysis 8(3), 299–311.  Mastrandrea et al. (2014) Mastrandrea, R., T. Squartini, G. Fagiolo, and D. Garlaschelli (2014). Enhanced reconstruction of weighted networks from strengths and degrees. New Journal of Physics 16(4), 043022.
 Meng and Rubin (1993) Meng, X.L. and D. B. Rubin (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80(2), 267–278.
 Miller and Blair (2009) Miller, R. E. and P. D. Blair (2009). Inputoutput analysis: foundations and extensions. Cambridge: Cambridge University Press.

MuñozCobo et al. (2017)
MuñozCobo, J.L., R. Mendizábal, A. Miquel, C. Berna, and
A. Escrivá (2017).
Use of the principles of maximum entropy and maximum relative entropy for the determination of uncertain parameter distributions in engineering applications.
Entropy 19(9), 486.  Onuki (2013) Onuki, Y. (2013). Extension of the iterative proportional fitting procedure and its evaluation using agentbased models. In M. Tadahiko, T. Takao, and T. Shingo (Eds.), AgentBased Approaches in Economic and Social Complex Systems VII, pp. 215–226. Berlin, Heidelberg: Springer.
 Powell (1969) Powell, M. J. (1969). A method for nonlinear constraints in minimization problems. In R. Fletcher (Ed.), Optimization, pp. 283–298. New York: Academic Press.
 Schneider and Zenios (1990) Schneider, M. H. and S. A. Zenios (1990). A comparative study of algorithms for matrix balancing. Operations Research 38(3), 439–455.
 Sheldon and Maurer (1998) Sheldon, G. and M. Maurer (1998). Interbank lending and systemic risk: An empirical analysis for switzerland. Swiss Journal of Economics and Statistics 134(4), 685–704.
 Squartini and Garlaschelli (2011) Squartini, T. and D. Garlaschelli (2011). Analytical maximumlikelihood method to detect patterns in real networks. New Journal of Physics 13(8), 083001.
 Tibshirani (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58(1), 267–288.
 Tutz and Groll (2010) Tutz, G. and A. Groll (2010). Generalized linear mixed models based on boosting. In Statistical Modelling and Regression Structures, pp. 197–215. Berlin, Heidelberg: Springer.
 Upper (2011) Upper, C. (2011). Simulation methods to assess the danger of contagion in interbank markets. Journal of Financial Stability 7(3), 111–125.
 Vardi (1996) Vardi, Y. (1996). Network tomography: Estimating sourcedestination traffic intensities from link data. Journal of the American Statistical Association 91(433), 365–377.
 Wu et al. (2014) Wu, L., Y. Yang, and H. Liu (2014). Nonnegativelasso and application in index tracking. Computational Statistics & Data Analysis 70, 116–126.
 Zhang et al. (2003) Zhang, Y., M. Roughan, C. Lund, and D. Donoho (2003). An informationtheoretic approach to traffic matrix estimation. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 301–312. Association for Computing Machinery.

Zhou
et al. (2016)
Zhou, H., L. Tan, Q. Zeng, and C. Wu (2016).
Traffic matrix estimation: A neural network approach with extended input and expectation maximization iteration.
Journal of Network and Computer Applications 60, 220 – 232.
Comments
There are no comments yet.