DeepAI
Log In Sign Up

Reconciling econometrics with continuous maximum-entropy models

In the study of economic networks, econometric approaches interpret the traditional Gravity Model specification as the expected link weight coming from a probability distribution whose functional form can be chosen arbitrarily, while statistical-physics approaches construct maximum-entropy distributions of weighted graphs, constrained to satisfy a given set of measurable network properties. In a recent, companion paper, we integrated the two approaches and applied them to the World Trade Web, i.e. the network of international trade among world countries. While the companion paper dealt only with discrete-valued link weights, the present paper extends the theoretical framework to continuous-valued link weights. In particular, we construct two broad classes of maximum-entropy models, namely the integrated and the conditional ones, defined by different criteria to derive and combine the probabilistic rules for placing links and loading them with weights. In the integrated models, both rules follow from a single, constrained optimization of the continuous Kullback-Leibler divergence; in the conditional models, the two rules are disentangled and the functional form of the weight distribution follows from a conditional, optimization procedure. After deriving the general functional form of the two classes, we turn each of them into a proper family of econometric models via a suitable identification of the econometric function relating the corresponding, expected link weights to macroeconomic factors. After testing the two classes of models on World Trade Web data, we discuss their strengths and weaknesses.

READ FULL TEXT VIEW PDF
02/08/2021

Variations on a Theme by Massey

In 1994, James Lee Massey proposed the guessing entropy as a measure of ...
01/12/2017

Maximum Entropy Flow Networks

Maximum entropy modeling is a flexible and popular framework for formula...
01/06/2011

Extending Bron Kerbosch for Solving the Maximum Weight Clique Problem

This contribution extends the Bron Kerbosch algorithm for solving the ma...
03/13/2013

MESA: Maximum Entropy by Simulated Annealing

Probabilistic reasoning systems combine different probabilistic rules an...
09/16/2020

Latent Dirichlet Allocation Models for World Trade Analysis

The international trade is one of the classic areas of study in economic...
08/10/2020

Probability Link Models with Symmetric Information Divergence

This paper introduces link functions for transforming one probability di...
02/05/2020

Exploring Maximum Entropy Distributions with Evolutionary Algorithms

This paper shows how to evolve numerically the maximum entropy probabili...

I Introduction

Over the last couple of decades, the growth of network science has impacted several disciplines by establishing new, empirical facts about many, real-world systems, in terms of the structural, network properties that are typically found in those systems. In the context of trade economics, the growing availability of data about import/export relationships among world countries has prompted researchers to explore and model the architecture of the international trade network, or World Trade Web (WTW) [1, 2, 3, 4, 5, 6, 7, 12, 8, 9, 10, 11].

This approach has complemented, and in many ways enriched, the traditional econometric exercise of modelling individual trade flows, i.e. relating the volume of individual trade exchanges to the most relevant covariates (generally, macroeconomic factors) they may depend on. The earliest example of an econometric model for international trade is the celebrated Gravity Model (GM) [13] that predicts that the expected value of the trade volume from country to country can be expressed via the econometric function

(1)

where is the GDP of country divided by the arithmetic mean of the GDPs of all countries, is the geographic distance between (generally the capitals of) countries and and

is a vector of parameters (notice that

when the direction of the exchanges is disregarded, as we will, throughout the paper, and takes care of dimensional units). Equation 1 has a long tradition in successfully explaining the existing (i.e. positive) trade volumes between pairs of countries. Outside economics, the gravity equation has also been extensively employed in studies concerning transportation, migration [14] and the maximization of utility functions constrained to satisfy requirements on the rate of information acquisition [15].

Although accurate in reproducing the positive trade volumes, the traditional GM cannot replicate structural network properties, unless the topology is completely fixed via a separate approach [16]. Indeed, if denotes the weighted adjacency matrix of the WTW, where the entry represents the trade volume from country to country , Eq. 1 predicts that the expected matrix has no off-diagonal zeroes, i.e. an expected positive trade relationship exists between all countries. This means that, when interpreted as an expected value of a regression with small, symmetric, zero-mean (e.g. Gaussian) noise, Eq. 1 predicts a fully connected network: if denotes the generic entry of the binary adjacency matrix (equal to if a positive trade volume from country to country is present, i.e. , and equal to if a zero trade is, instead, observed, i.e. ), then Eq. 1 predicts almost surely , . This result is in obvious contrast with empirical data, which show that the WTW has a rich topological architecture, characterized by a broad degree distribution, (dis)assortative and clustering patterns and other properties [1, 2, 3, 4, 5, 6, 7, 12, 8, 9, 10, 11].

In order to overcome such a limitation, the plain Gravity Model needs to be ‘dressed’ with a probability distribution that produces as the expected value while, at the same time, accounts for null outcomes as well (i.e. those entries reading and representing missing links in the network) [17]. Clearly, the support of the probability should not include matrices with negative numbers. From the Sixties on, the GM has indeed been interpreted as the expected value of a probability distribution whose functional form needs to be determined.

Trade econometrics models the tendency of countries to establish trade relationships relating it to accepted, macroeconomic determinants (the so-called ‘factors’) such as GDP and geographic distance, as in the expression of Eq. 1

. Econometricians have considered increasingly flexible distributions, the most recent versions of them being capable of disentangling the estimation of the presence of a trade exchange from the estimation of the traded amount. This has led to the definition of two distinct classes of models, i.e.

zero-inflated models [19] and hurdle models [20]. Zero-inflated (ZI) models have been introduced to model the following two scenarios: the possibility of exchanging a zero amount of goods even after having established a trade partnership (e.g. because of a limited trading capacity); the possibility of establishing a very small amount of trade (in fact, so small to be compatible with statistical noise and, as such, removed). A general drawback of employing ZI models is that of predicting sparser-than-observed network structures [18]; moreover, only discrete distributions (specifically, either the Poisson or the negative-binomial one [19]) have been considered, so far, to carry out the proper weight-estimation step. Hurdle models, introduced to overcome the limitations affecting zero-inflated models, can predict zeros only at the first step [20]

: in any case, the presence of links is established by employing either a logit or a probit estimation step.

Network science has tackled the aforementioned inference problem using techniques rooted in statistical physics. The most prominent examples descend from the Maximum-Entropy Principle (MEP) [37, 38, 39] applied to network ensembles [21, 22], prescribing to maximize Shannon entropy [23, 24] in a constrained fashion to obtain the maximally unbiased distribution of networks compatible with a chosen set of structural constraints. This approach is formally equivalent to the construction of so-called Exponential Random Graphs (ERGs) for social network analysis [25] but differs in the typical choice of the constraints: in particular, when the enforced constraints are local, such as the degree (number of links) and the strength (total link weight) of each node, maximum-entropy network models have been shown to successfully replicate both the topology and the weights of many economic and financial networks, including the WTW [5, 6, 26, 27, 28, 29, 30, 31, 32, 18]. The entire framework can also accommodate possibly degenerate, discrete-valued, single- or multi-edges [33].

Although maximum-entropy models have been also studied from an economic perspective (see [34] for a discussion of the economic relevance of the constraints defining the Poisson and the geometric network models), it is only recently that progress has been made to reconcile the above two approaches, allowing for economic factors parametrizing the maximum-entropy probability distribution producing links and weights [5, 6, 29, 30, 31, 32, 18] or by introducing network-related statistics into otherwise purely econometric models [35]. On one hand, the novel framework enriches the methods developed by network scientists with an econometric interpretation; on the other, it enlarges the list of candidate distributions usable for econometric purposes.

With this contribution, we refine the theoretical picture provided in a companion paper [18], introducing models to infer the topology and the weights of undirected networks defined by continuous-valued data. In order to do so, we present a theoretical, physics-inspired framework capable of accommodating both integrated and conditional, continuous models, our goal being threefold: 1) testing the performance of both classes of models on the WTW in order to understand which one is best suited for the task; 2) offering a principled derivation of currently available, conditional, econometric models; 3) enlarging the list of continuous-valued distributions to be used for econometric purposes. From an econometric point of view, our work moves along the methodological guidelines defining the class of Generalized Linear Models (GLMs) [36] while enriching it with distributions defined by both econometric and structural parameters. From a statistical physics point of view, our work expands the class of maximum-entropy network models [21] or weighted ERGs [25] and endows them with macroeconomic factors replacing certain model parameters.

The rest of the paper is organized as follows: in Sec. II, after introducing the basic quantities, we derive the class of conditional models; in Sec. III we derive the class of integrated models; in Sec. IV we apply all models to the analysis of WTW data; in Sec. V we discuss the results and provide our concluding remarks.

Ii Conditional Models

Discrete maximum-entropy models can be derived by performing a constrained maximization of Shannon entropy [37, 38, 39]. However, unlike the companion paper [18], our focus, here, is on continuous probability distributions. In such a case, mathematical problems are known to affect the definition of Shannon entropy and the resulting inference procedure. To restore the framework, one has to introduce the Kullback-Leibler (KL) divergence of a distribution from a prior distribution and re-interpret the maximization of the entropy of as the minimization of from a given prior distribution . In formulas, the KL divergence is defined as

(2)

where

is one of the possible values of a continuous random variable (in our setting, an entire network with continuous-valued link weights),

is the set of possible values that can take,

is the (multivariate) probability density function to be estimated and

plays the role of prior distribution, the divergence of from which must be minimized. Such an optimization scheme embodies the so-called Minimum Discrimination Information Principle (MDIP), originally proposed by Kullback and Leibler [40] and implementing the idea that, given a prior distribution and new information that becomes available, an updated distribution should be chosen in order to make its discrimination from as hard as possible. In other words, the MDIP demands that new data produce an information gain that is as small as possible. The use of the KL divergence is widespread in the fields of information theory [23]

and machine learning 

[41]

, e.g. as a loss function within the Generative Adversarial Network (GAN) scheme (the aim of the ‘generating’ neural network being that of producing samples that cannot be distinguished from those constituting the training set by the ‘discriminating’ neural network).

In order to introduce the class of conditional models, we write the posterior distribution as

(3)

where denotes the adjacency matrix for the binary projection of the weighted network . The above equation allows us to split the KL divergence into the following sum of three terms

(4)

where

(5)

is the Shannon entropy of the probability distribution describing the binary projection of the network structure,

(6)

is the conditional Shannon entropy of the probability distribution of the weighted network structure given the binary projection and

(7)

is the cross entropy quantifying the amount of information required to identify a weighted network sampled from the distribution by employing the distribution . When continuous models are considered, is defined by a first sum running over all the binary configurations within the ensemble and an integral over all the weighted configurations that are compatible with each, specific, binary structure - embodied by the adjacency matrix , i.e. such that ).

The expression for can be further manipulated as follows. Upon separating the prior distribution itself into a purely binary part and a conditional, weighted one, one can write

(8)

an expression that allows us to write as

(9)

which, in turn, allows the KL divergence to be rewritten as

(10)

i.e. as a sum of two terms, one of which involves conditional distributions; specifically,

(11)
(12)

with representing the binary prior and representing the conditional, weighted one. In what follows, we will deal with completely uninformative priors: this amounts at considering the somehow ‘simplified’ expression

(13)

with

(14)
(15)

The (independent) constrained optimization of and represents the starting point for deriving the members of the class of conditional models.

ii.1 Choosing the binary constraints

The functional form controlling for the binary part of conditional models can be derived by carrying out a constrained maximization of the binary Shannon entropy

(16)

leading to a probability mass function reading

(17)

where the functional form of the Hamiltonian reads . This choice induces a factorization of the probability mass function , which becomes

(18)

i.e. a product of a number of Bernoulli-like probability mass functions with , , where is the Lagrange multiplier controlling for the generic entry of the adjacency matrix .

In what follows, we will consider the specification of the tensor-like Hamiltonian introduced above reading

, , a choice inducing the Undirected Binary Configuration Model (UBCM), characterized by the following, pair-specific probability coefficient

(19)

and ensuring the entire degree sequence of the network at hand to be reproduced.

The econometric reparametrization of the UBCM can be achieved by posing , , a choice inducing the so-called fitness model (FM), characterized by the pair-specific probability coefficient

(20)

and requiring the estimation of a global parameter only, i.e. . The FM represents a particular case of the logit model [42], being defined by a vector of external properties (the ‘fitnesses’) that replace the information provided by some kind of (otherwise) purely structural properties [43, 5]: in fact, Eq. 20 can be equivalently rewritten as with and . The global constant can be determined by imposing the total number of links as the only constraint. Remarkably, the fitness model has been proven to reproduce the (binary) properties of a wide spectrum of real-world systems [26, 6] as accurately as the UBCM, although requiring much less information.

In what follows, we will consider both the UBCM and the FM specifications.

ii.2 Choosing the weighted constraints

The constrained maximization of proceeds by specifying the following set of weighted constraints

(21)
(22)

the first condition ensuring the normalization of the probability distribution and the vector

representing the ‘proper’ set of weighted constraints (weights are, now, treated as continuous random variables, i.e.

, ). They induce the distribution reading

(23)

where is the so-called Hamiltonian, listing the constrained quantities, and is the partition function, conditional on the ‘fixed topology’ .

The explicit functional form of can be obtained only once the functional form of the constraints has been specified. In what follows, we will deal with the Hamiltonian reading

(24)

with the Lagrange multipliers satisfying the following requirements:

  • , where is the Lagrange multiplier associated with the total weight and encodes the dependence on purely econometric quantities;

  • will be kept either in its dyadic form, to constrain the logarithm of each weight, or in its global form, , to constrain the sum of the logarithms of the weights, i.e. ;

  • plays the role of the Lagrange multiplier associated with (a function of) the total variance of the logarithms of the weights, i.e.

    .

ii.2.1 Conditional exponential model.

Let us start by considering the simplest, conditional model, defined by the positions and inducing the Hamiltonian

(25)

inserting the expression above into Eq. 23 leads to the distribution

(26)

and each node pair-specific distribution induces a (conditional) expected weight reading

(27)

From a purely topological point of view, constraining each weight and their total sum is redundant. However, this is no longer true when turning the conditional, exponential model into a proper econometric one. Its econometric reparametrization should be consistent with the literature on trade, stating that the weights are monotonically increasing functions of the gravity specification, i.e. , and with ; for this reason, the link function

usually associated with the exponential distribution prescribes to identify the

linear predictor with the inverse of the purely econometric parameter of the model, i.e.

(28)

a position that turns Eq. 27 into

(29)

notice that the only structural constraint is, now, represented by the total weight (see also Appendix A).

ii.2.2 Conditional gamma model.

Let us, now, consider a different Hamiltonian, constraining each weight, their total sum and the sum of their logarithms, i.e.

(30)

it induces the distribution reading

(31)

each node pair-specific distribution is characterized by a (conditional) expected weight reading

(32)

and by a (conditional) expected logarithmic weight reading

(33)

where the function is the so-called digamma function.

Such a model can be turned into a proper, econometric one by considering the inference scheme of the gamma model with inverse response, which allows us to identify the linear predictor with the inverse of the purely econometric parameter of the model, i.e.

(34)

a position that, in turn, leads to the expressions

(35)

(allowing the conditional, exponential model to be recovered in case , i.e. when the constraint on the sum of the logarithms of the weights is switched-off) and

(36)

(see also Appendix A).

ii.2.3 Conditional Pareto model.

Constraining a slightly more complex function of the weights, i.e. their logarithm, leads to the Hamiltonian

(37)

which, in turn, induces the distribution

(38)

where is the minimum, node pair-specific weight allowed by the model. Each node pair-specific distribution is characterized by a (conditional) expected weight reading

(39)

Such a model can be turned into a proper, econometric one by considering the positions

(40)

ensuring that the expected weights are monotonically increasing functions of the gravity specification and leading to the expression

(41)

where is the empirical, minimum weight (see also Appendix A).

Let us explicitly notice that the derivation of the gamma and Pareto distributions within the maximum-entropy framework has been already studied in [44]; here, however, we aim at making a step further, by individuating a suitable redefinition of these models parameters capable of turning them into proper, econometric ones.

ii.2.4 Conditional log-normal model.

Adding a global constraint on (a function of) the total variance of the logarithms of the weights to the Hamiltonian defining the Pareto model leads to the expression

(42)

the Hamiltonian above induces a distribution reading

(43)

each node pair-specific distribution is characterized by a (conditional) expected weight reading

(44)

by a (conditional) expected, logarithmic weight reading

(45)

and by a (conditional) logarithmic weight whose squared expectation reads

(46)

Such a model can be turned into a proper, econometric one by considering the position

(47)

ensuring that the expected weights are monotonically increasing functions of the gravity specification and leading to the expressions

(48)
(49)
(50)

(see also Appendix A).

Iii Integrated Models

MDIP can be also implemented in a straightforward way, by carrying out a constrained optimization of . In this second case, the following set of constraints

(51)
(52)

can be specified, with obvious meaning of the symbols. Differentiating the corresponding Lagrangean functional with respect to and equating the result to zero leads to

(53)

where is, again, the Hamiltonian and is the ‘integrated’ partition function.

The explicit functional form of can be obtained only once the functional form of both the prior distribution and the constraints has been specified as well. In what follows, we will deal with completely uninformative priors, a choice that amounts at considering the simplified expression

(54)

notice that the result above could have been also derived by carrying out a constrained minimization of

(55)

i.e. of (minus) the functional named differential entropy into which the KL divergence ‘degenerates’ in case completely uninformative priors are considered.

iii.1 Choosing the constraints

Let us, now, specify the functional form of the constraints. In what follows, we will deal with a specific instance of the generic Hamiltonian

(56)

in particular, one could pose , a choice that would lead to constrain the total number of links, or , a choice that would lead to constrain the whole degree sequence. If not specified otherwise, in what follows we will employ the second functional form and pose , where is the Lagrange multiplier associated with the total weight and encodes the dependence on purely econometric quantities. Our choices induce the Hamiltonian of the so-called integrated exponential model, i.e.

that leads to the distribution

(57)

where . The generic node pair-specific distribution induces a probability for nodes and to be connected reading

(58)

besides, the corresponding expected weight reads

(59)

Equation III.1

clarifies why the models considered in the present section are classified as ‘integrated’: each node pair-specific probability of connection is a function of the parameters controlling for both topological

and weighted properties. Models of the kind are, thus, capable of ‘integrating’ information concerning a network structure with information concerning its weights, hence employing them in a joint fashion to define both inference steps.

The recipe for the econometric reparametrization of the integrated exponential model can read as the one of its conditional counterpart, i.e.

(60)

a position that turns Eq. 59 into

(61)

where

(62)

(see also Appendix B).

Iv Results

The effectiveness of the two classes of models considered in the present paper to reproduce the topological properties of the World Trade Web has been tested on two different datasets, i.e. the Gleditsch one (covering 11 years, from 1990 to 2000 [45]) and the BACI one (covering 11 years, from 2007 to 2017 [46]). To carry out our analyses, we have built the ensemble induced by each model as follows. First, the presence of a link connecting any two nodes and is established with probability : as usual, a real number

is drawn from the uniform distribution whose support coincides with the unit interval, i.e.

, and compared with ; if , and are linked, otherwise they are not. Once the presence of a link is established, it is loaded with a weight by employing the inverse transform sampling technique. According to it, it is possible to generate random variables by considering the complementary cumulative distribution , defined as

(63)

since behaves like a random variable, say , uniformly distributed between 0 and 1, the equation can be inverted to obtain , i.e. the weight to be assigned to the pair . Each ensemble is constituted by a number of configurations amounting at

; the error accompanying the estimate of any quantity of interest is quantified via the confidence intervals (CI) induced by the ensemble distribution of the former one.

Dataset Model
Gleditsch I-Exp
Gleditsch UBCM
Gleditsch FM
Dataset Model
BACI I-Exp
BACI UBCM
BACI FM
Dataset Model
Gleditsch I-Exp
Gleditsch C-Exp
Gleditsch C-Pareto
Gleditsch C-Gamma
Gleditsch C-Lognormal
Dataset Model
BACI I-Exp
BACI C-Exp
BACI C-Pareto
BACI C-Gamma
BACI C-Lognormal
Table 1: Compatibility between the distributions of the expected values of the statistics output by the models considered in the present work and their empirical counterparts for the Gleditsch (left) and the BACI (right) dataset. A large value of indicates a large percentage of years for which the distribution of the network statistic predicted by model is compatible with the empirical one. While all models seem to perform quite well in reproducing the binary statistics on the Gleditsch dataset, this is no longer true when considering the BACI dataset, on which the UBCM outperform the FM - a result that leads us to prefer the former as the ‘first step-algorithm’ of our conditional models. For what concerns the set of weighted statistics, the models constraining (i.e. the integrated exponential model, the conditional exponential model and the conditional gamma model) clearly outperform the others.

iv.1 Model selection via statistical indicators

Let us consider two measures of goodness-of-fit, i.e. the reconstruction accuracy and the Kolmogorov-Smirnov (KS) compatibility frequency : while is defined as the percentage of node-specific values of the statistic falling within the CIs, induced by model , at the significance level of , measures the percentage of times the distribution of a given, expected statistics , under model , is compatible with the empirical one, according to the two-sample Kolmogorov-Smirnov test; for example, a value would indicate that the distribution of the expected values of the statistics , under model , is found to be compatible with the empirical one on the of the years in the dataset, at the significance level of .

The network statistics for which the values and have been computed are the degree sequence

(64)

which gives information about the tendency of single nodes to link with other trade partners, the average nearest neighbors degree

(65)

which gives information about the degree correlations, the clustering coefficient

(66)

which counts the percentage of node partners that are also partners themselves. For what concerns the weighted statistics, we have considered the strength sequence

(a) Reconstruction accuracy for the binary network statistics.
(b) Reconstruction accuracy for the weighted network statistics.
Figure 1: Reconstruction accuracy for the statistics of interest, calculated as the percentage of node-specific values of the statistics falling within the CI, induced by model , at the significance level of ; yearly percentages are, then, averaged. The whiskers represent the and percentiles of each distribution, across different years. Overall, all models perform quite well in reproducing the degrees, with the only exception of the FM - whence our choice of employing the UBCM as the ‘first step-algorithm’ of our conditional models. For what concerns higher-order, binary statistics, both the UBCM and the I-Exp perform quite well while the performance of the FM is much poorer - a result that holds true for both the Gleditsch (left panels) and the BACI (right panels) dataset. For what concerns the weighted statistics, only the average nearest neighbors strength is satisfactorily recovered - however, only by the (integrated and conditional) exponential models and the conditional gamma one.
(67)

which gives information about the trade flow of a country, the average nearest neighbors strength

(68)

which gives information about the strength correlations, the weighted clustering coefficient

(69)

that weighs the closed triangular patterns node establishes with other trade partners.

Figure 2: Reconstruction accuracy for the Gleditsch dataset (left panel) and the BACI dataset (right panel). All models perform quite well in reproducing the weights (across all years, on both datasets) with the only exception of the conditional Pareto model. Overall, the best-performing model on the Gleditsch dataset is the integrated exponential while the best-performing model on the BACI dataset is the conditional gamma model.

iv.1.1 KS compatibility frequency

Table 1 lists the values of for both binary and weighted network statistics. For what concerns the binary statistics, we report the performance of three different models, i.e. the UBCM, the FM and the integrated exponential model (denoted as I-Exp).

For what concerns the Gleditsch dataset, compatibility is observed for every year; for what concerns the BACI dataset, instead, this is no longer true: in fact, the FM outputs predictions that are not compatible with the empirical values for a large number of years and irrespectively from the considered quantity; the UBCM and the I-Exp (i.e. the models constraining the degrees), instead, output predictions whose compatibility depends on the considered quantity: higher-order statistics are the ones for which the two aforementioned models ‘fail’ to the larger extent. Overall, these results lead us to prefer the UBCM as the ‘first step-algorithm’ of our conditional models.

Let us, now, comment on the performance of our models in reproducing weighted statistics. As it can be appreciated upon looking at Table 1, the only models outputting predictions whose distributions are compatible with the empirical analogues are the integrated exponential one, the conditional exponential one and the conditional gamma one. On the other hand, employing only logarithmic constraints (as for the conditional Pareto model and the conditional log-normal model) does not help improving the accuracy of the description of the system at hand.

iv.1.2 Reconstruction accuracy

So far, we have inspected the compatibility of the distributions of the empirical values of each network statistics with the ones of their expected values under each of our models. Let us, now, quantify the extent to which each model is able to recover node-wise information by computing the values. Figure 1 shows the temporal average of the latter ones (i.e. across the years covered by our datasets), with the whiskers representing their variation, i.e. an indication of the stability of each model performance.

For what concerns the binary statistics (see Fig. 1a), both the UBCM and the I-Exp perform quite well in reproducing them; on the other hand, the performance of the FM is much poorer. For what concerns the weighted statistics (see Fig. 1b), only the average nearest neighbors strength is satisfactorily recovered by the (integrated and conditional) exponential models and the conditional gamma one. Still, they are found to perform poorly on the other statistics, i.e. the strength, that is only recovered in distribution on both datasets, and the weighted clustering coefficient, that is only recovered in distribution on the BACI dataset. For what concerns the lognormal model, it performs better than competitors in reproducing the strength and the weighted clustering coefficient on the Gledistch dataset but worse on the BACI dataset, causing its behavior to be dataset-dependent.

Finally, let us inspect the reconstruction accuracy of our models for what concerns our networks link weights. Specifically, let us define , i.e. the percentage of empirical weights falling within the CI, induced by model , at the significance level of  [48]. Here, we have proceeded numerically, i.e. by considering the and the percentiles induced by the ensemble distribution of each node pair-specific weight. As Fig. 2 shows, all models perform quite well in reproducing the weights (across all years, on both datasets) with the only exception of the conditional Pareto model. Overall, the best-performing model on the Gleditsch dataset is the integrated exponential one while the best-performing model on the BACI dataset is the conditional gamma model.

iv.1.3 Confusion matrix

The UBCM and the integrated exponential model perform similarly in reproducing the binary statistics, on both datasets. Let us, now, compare them in reproducing the four indicators composing the so-called confusion matrix, i.e. the true positive rate

(measuring the percentage of links correctly recovered by a given reconstruction method), the specificity (measuring the percentage of zeros correctly recovered by a given reconstruction method), the positive predictive value (measuring the percentage of links correctly recovered by a given reconstruction method with respect to the total number of links predicted by it) and the accuracy (measuring the overall performance of a given reconstruction method in correctly placing both links and zeros).

The results are reported in Table 2, that shows the increments of the four indicators, defined as with . Notice that each entry of the table is positive, a result signalling that the integrated exponential model steadily performs better than the UBCM. This is further confirmed by the (non-parametric) Wilcoxon rank-sum test on the ensemble distributions of the statistics to compare: all increments are significant, at the level.

iv.2 Model selection via statistical tests

Let us now rigorously test if constraining the entire degree sequence leads to a significantly better description of our data than that obtainable by just constraining the total number of links .

Upon solving the model constraining the entire degree sequence and the one constraining the total number of links, we are able to construct a vector reading where is either or for the -th statistics under the ‘-constrained version’ of model ; on the other hand, is either or for the -th statistics under the ‘-constrained version’ of model - naturally, both values have been considered for the same year, keeping the same set of weighted constraints. Pairing statistics as described above allows us to employ the (non-parametric) Wilcoxon signed-rank test for testing the hypotheses and , i.e. that the models just constraining perform better, in reproducing the statistics , than those constraining the entire degree sequence.

Our results let us conclude that, for both datasets, constraining the degree sequence leads to a significant improvement, at the level of , of the reconstruction accuracy of the average nearest neighbors degree, the clustering coefficient, the strengths and the average nearest neighbors strength; on the other hand, constraining the degree sequence does not lead to any significant improvement of the reconstruction accuracy of the weighted clustering coefficient. For what concerns the KS compatibility frequency, a significant improvement, at the level of , is observed in the description accuracy of the average nearest neighbors degree, the clustering coefficient and the average nearest neighbors strength.

iv.3 Model selection via information criteria

Let us, now, compare the performance of our models in a more general fashion. To this aim, let us consider the Akaike Information Criterion (AIC) [49], reading

(70)

where is the number of free parameters of the model and is its log-likelihood, evaluated at its maximum.

Dataset
Gleditsch 90
Gleditsch 91
Gleditsch 92
Gleditsch 93
Gleditsch 94
Gleditsch 95
Gleditsch 96
Gleditsch 97
Gleditsch 98
Gleditsch 99
Gleditsch 00
Dataset
BACI 07
BACI 08
BACI 09
BACI 10
BACI 11
BACI 12
BACI 13
BACI 14
BACI 15
BACI 16
BACI 17
Table 2: Increments of the four indicators composing the confusion matrix, i.e. the true positive rate (TPR), the specificity (SPC), the positive predicted value (PPV) and the accuracy (ACC) when passing from the UBCM to the integrated exponential model for the Gleditsch (left table) and the BACI (right table) datasets. All increments are significant at the level, according to the (non-parametric) Wilcoxon rank-sum test on the ensemble distributions of the statistics to compare.

The purely binary log-likelihood induced by model is readily obtained from Eq. (18) and reads

(71)

where is the generic entry of the empirical adjacency matrix and is the model-dependent probability that node and node