DeepAI
Log In Sign Up

Dyadic Regression

Dyadic data, where outcomes reflecting pairwise interaction among sampled units are of primary interest, arise frequently in social science research. Regression analyses with such data feature prominently in many research literatures (e.g., gravity models of trade). The dependence structure associated with dyadic data raises special estimation and, especially, inference issues. This chapter reviews currently available methods for (parametric) dyadic regression analysis and presents guidelines for empirical researchers.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/10/2019

The Regression Discontinuity Design

This handbook chapter gives an introduction to the sharp regression disc...
03/25/2021

Authorship ethics: an overview of research on the state of practice

Authorship ethics is a central topic of discussion in research ethics fo...
07/30/2019

Network Dependence and Confounding by Network Structure Lead to Invalid Inference

Researchers across the health and social sciences generally assume that ...
06/08/2021

Inference for Network Regression Models with Community Structure

Network regression models, where the outcome comprises the valued edge i...
09/29/2021

On the reliability of published findings using the regression discontinuity design in political science

The regression discontinuity (RD) design offers identification of causal...
05/31/2019

Uncoupled Regression from Pairwise Comparison Data

Uncoupled regression is the problem to learn a model from unlabeled data...

1 Population and sampling framework

Let index agents in some (infinite) population of interest. In what follows I will refer to agents as, equivalently, nodes, vertices, units and/or individuals. Let be an observable attribute which partitions this population into subpopulations or “types”; equals the index set associated with the subpopulation where . While may be very large, the size of each subpopulation is assumed infinite. In practice will typically enumerate different combinations of distinct agent-specific attributes (e.g.,

may correspond to former British colonies in the tropics with per capita GDP below $3,000). Heuristically we can think of

as consisting of the support points of an multinomial approximation to a (possibly continuous) underlying covariate space as in Chamberlain (1987).

The indexing of agents within subpopulations homogenous in

is arbitrary; from the standpoint of the researcher all vertices of the same type are exchangeable. Similar exchangeability assumptions underlie most cross-sectional microeconometric procedures. For each (ordered) pair of agents – or

directed dyad – there exists an outcome of interest . The first subscript in indexes the directed dyads ego, or “sending” agent, while the second its alter, or “receiving” agent. The adjacency matrix collects all such outcomes into an (infinite) random array. Within-type exchangeability of agents implies a particular form of joint exchangeability of the adjacency matrix.

To describe this exchangeability condition let be any permutation of indices satisfying the restriction

(3)

Condition (3) restricts relabelings to occur among agents of the same type (i.e., within the index sets , ). Following Crane & Towsner (2018) a network is relatively exchangeable with respect to (or -exchangeable) if, for all permutations ,

(4)

where denotes equality of distribution.

If we regard as a (weighted) directed network and as vertex ’s “color”, then (4

) is equivalent to the statement that all colored graph isomorphisms are equally probable. Since there is nothing in the researcher’s information set which justifies attaching different probabilities to graphs which are isomorphic (as vertex colored graphs) any probability model for the adjacency matrix should satisfy (

4). If encodes all the vertex information observed by the analyst, then -exchangeability is a natural a priori modeling restriction.

Condition (4) allows for the invocation of very powerful de Finetti (1931) type representation results for random arrays. These results provide an “as if” (nonparametric) data generating process for the network adjacency matrix. This, in turn, facilitates various probabilistic calculations (e.g., computing expectations and variances) and gives (tractable) structure to the dependence across the elements of .

Let , and be i.i.d. random variables. We may normalize , and to be – uniform on the unit interval – without loss of generality. We do allow for within-dyad dependence across and ; the role such dependence will become apparent below. Next consider the random array generated according to the rule

(5)

Data generating process (DGP) (5) has a number of useful features. First, any pair of outcomes, and , sharing at least one index in common are dependent. This holds true even conditional on their types and . Second, if and share exactly one index in common, say , then they are independent if and are additionally conditioned on. Third, if they share both indices in common, as in and , then there may be dependence even conditional on and due to the within-dyad dependence across and . These patterns of structured dependence and conditional independence will be exploited below to derive the limit distribution of parametric dyadic regression coefficient estimates. Shalizi (2016) helpful calls models like (5) conditionally independent dyad (CID) models (see also Chandrasekhar (2015)).

Crane & Towsner (2018), extending Aldous (1981) and Hoover (1979), show that, for any random array satisfying (4), there exists another array , generated according to (5), such that

(6)

Rule (5) can therefore be regarded as a nonparametric data generating process for . Equation (6) implies that we may proceed ‘as if’ our -exchangeable network was generated according to (5). In the spirit of Diaconis & Janson (2008) and Bickel & Chen (2009) and others, call a graphon. Here is an unidentiable mixing parameter, analogous to the one appearing in de Finetti’s (1931) classic representation result for exchangeable binary sequences. Since I will focus on inference which is conditional on the empirical distribution of the data, can be safely ignored and I will write in what follows (cf., Bickel & Chen, 2009; Menzel, 2017).

The Crane & Towsner (2018) representation result implies that a very particular type of dependence structure is associated with -exchangeability. Namely, as discussed earlier, and are (conditionally) independent when and

share no indices in common and dependent when they do. This type of dependence structure, which is very much analogous to that which arises in the theory U-Statistics, is tractable and allows for the formulation of Laws of Large Numbers and Central Limit Theorems. The next few sections will show how to use this insight to develop asymptotic distribution theory for dyadic regression.

Sampling assumption

I will regard as an infinite random (weighted) graph, with nodes and (weighted) edges given by the non-zero elements of . Let be a random sample of size from . Let be the subgraph indexed by . We assume that the observed network corresponds to the one induced by a random sample of agents from the larger (infinite) graph. The sampling distribution of any statistic of is induced by this (perhaps hypothetical) random sampling of agents from .

If is relatively exchangeable, then will we be as well. We can thus proceed ‘as if’

for . In what follows we assume that we observe for each sampled agent, and for each pair of sampled agents, we observe both and . The presentation here rules out self loops (i.e.,), however incorporating them is natural in some empirical settings and what follows can be adapted to handle them. Similarly the extension to undirected outcomes, where , is straightforward.

2 Composite likelihood

Let be a parametric family for the conditional density of given and . This family is chosen by the researcher. Let denote the corresponding log-likelihood. As an example to help fix ideas, return to the variant of the gravity model of trade introduced in the introduction. Following Santos Silva & Tenreyro (2006) we set

which equals (up to a term not varying with ) the log likelihood of a Poisson random variable with mean , and choose to maximize

(7)

The maximizer of (7) coincides with a maximum likelihood estimate based upon the assumption that are independent Poisson random variables conditional on

In practice, trade flows are unlikely to be well-described by a Poisson distribution and independence of the summands in (

7) is even less likely. As discussed earlier any two summands in (7) will be dependent if they share an index in common. The likelihood contribution associated with exports from Vanuatu to Fiji is not independent of that associated with exports from Fiji to Bangladesh. Dependencies of this type mean that proceeding ‘as if’ (7) is a correctly specified log-likelihood (or even an M-estimation criterion function) will lead to incorrect inference.

If there exists some such that is the true density, then (5) corresponds to what is called a composite likelihood (e.g., Lindsey, 1988; Cox & Reid, 2004; Bellio & Varin, 2005). Because it does not correctly reflect the dependence structure across dyads, (5) is not a correctly specified log-likelihood function in the usual sense. If, however, the marginal density of is correctly specified, then will generally be consistent for . That is we may have that

for some (i.e., the marginal likelihood is correctly specified), but it is not the case that, setting ,

due to dependence across dyads sharing agents in common (i.e., the joint likelihood is not correctly specified). A composite log-likelihood is constructed by summing together a collection of component log-likelihoods; each such component is a log-likelihood for a portion of the sample (in this case a single directed dyad) but, because the joint dependence structure may not be modeled appropriately, the summation of all these components may not be the correct log likelihood for the sample as a whole.

If the marginal likelihood is itself misspecified, then (5) corresponds to what might be called a pseudo-composite-log-likelihood; “pseudo” in the sense of Gourieroux et al. (1984) and “composite” in the sense of Lindsey (1988). In what follows I outline how to conduct inference on the probability limit of (denoted by in all cases); the interpretation of this limit will, of course, depend on whether the pairwise likelihood is misspecified or not. In the context of the Santos Silva & Tenreyro (2006) gravity model example, if the true conditional mean equals for some , then will be consistent for it (under regularity conditions). The key challenge is to characterize this estimate’s sampling precision.

3 Limit distribution

To characterize the limit properties of begin with a mean value expansion of the first order condition associated with the maximizer of (7). This yields, after some re-arrangement,

with a mean value between and which may vary from row to row, the superscript denoting a Moore-Penrose inverse, and a “score” vector of

(8)

with for and . In what follows I will just assume that , with invertible (see Graham (2017) for a formal argument in a related setting and Eagleson & Weber (1978) and Davezies et al. (2019) for more general results).

If the Hessian matrix converges in probability to , as assumed, then

so that the asymptotic sampling properties of will be driven by the behavior of . As pointed out by Fafchamps & Gubert (2007) and others, (8) is not a sum of independent random variables, hence a basic central limit theorem (CLT) cannot be (directly) applied.

My analysis of borrows from the theory of U-Statistics (e.g., Ferguson, 2005; van der Vaart, 2000). To make these connections clear it is convenient to re-write as

where .

Let , and next decompose as follows

where equals the projection of onto and :

(9)

and is the corresponding projection error:

(10)

Observe that and are uncorrelated by construction. Furthermore is a U-statistic, albeit defined – partially – in terms of the latent variable . Although we can not numerically evaluate , we can characterize is sampling properties as . In order to do so we further decompose into a Hájek projection and a second remainder term:

where, defining and ,

The superscript in stands for ‘ego’ since corresponds to the expected value of a (generic) dyad’s contribution to the composite likelihood’s score vector holding its ego’s attributes fixed. Similarly the superscript in stands for ‘alter’, since it is her attributes being held fixed in that average.

Putting things together yields the score decomposition

The limit distribution of depends on the joint behavior of , and as . A similar type of double projection argument was utilized by Graham (2017)

to characterize the limit distribution of the Tetrad Logit estimator.

444It is also implicit in the analysis of Bickel et al. (2011). The analyses of Menzel (2017) and Graham et al. (2019) both utilize a similar decomposition.

Variance calculation

In this section I first derive the sampling variance of and then provide an interpretation of it. I begin by calculating the variance of

Let

when the dyads and share indices in common. A Hoeffding (1948) variance decomposition gives

Direct calculation yields (see Appendix A)

(11)

with

Similarly we have

(12)

and, in an abuse of notation, letting

(13)

where

From (11), (12) and (13) we have, collecting terms, a variance of equal to

(14)

To understand (14) note that there are exactly pairs of dyads sharing one agent in common. Consequently, applying the variance operator to yields a total of non-zero covariance terms across the summands in . It is these covariance terms which account for the leading term in (14). The second and third terms in (14) arise from the variances of the summands in . Indeed, it is helpful to note that

and hence that

(15)

Although it may be that (in a positive definite sense), the larger number of non-zero covariance terms generated by applying the variance operator to contributes more to its variability, than the smaller number of own variance terms. Inspecting (14) it is clear that the multiplying by stabilizes the variance such that

and hence

as .

If a researcher uses standard software, for example a Poisson regression program, to maximize the composite log-likelihood (7) and then chooses to report robust Huber (1967) type standard errors, this corresponds to assuming that

This approach would ignore the dominant variance term and part of the higher order term as well. If, instead, the researcher clustered her standard errors on dyads, as in, for example, Santos Silva & Tenreyro (2010), then this corresponds to assuming that

but allowing and/or

to differ from zero. This approach would still erroneously ignore the dominant variance term. In both cases reported confidence intervals are likely to undercover the true parameter; perhaps by a substantial margin. This is shown, by example, via Monte Carlo simulation below.

Variance estimation

Graham (TBD) provides a comprehensive discussion of variance estimation for dyadic regression. One approach to variance estimation he reviews shows that can be estimated by the analog covariance estimate

where the summation is over all triads in the sampled network. Each triad can itself be partitioned into three different pairs of dyads, each sharing an agent in common.

It turns out, as inspection of (15) suggests, it is easiest to estimate the sum of and jointly by

The Jacobian matrix, , may be estimated by , which is typically available as a by-product of estimation in most commercial software. Putting things together gives a variance estimate of

(16)

Graham (TBD) shows that (16) is numerically equivalent, up to a finite sample correction, to the variance estimator proposed by Fafchamps & Gubert (2007). This variance estimator includes estimates of asymptotically negligible terms. Although these terms are negligible when the sample is large enough, in practice they may be sizable in real world settings.

Limit distribution

The variance calculations outlined above imply that and hence that

Since is the sum of i.i.d. random variables a CLT gives

(17)

The variance expression, equation (14), indicates that inference based upon the limit distribution (17) would ignore higher order variance terms included in (16). In practice, as has been shown in other contexts, an approach to inference which incorporates estimates of these higher order variance terms may result in inference with better size properties (e.g., Graham et al., 2014; Cattaneo et al., 2014; Graham et al., 2019). In practice I suggest using the normal reference distribution, but with a variance estimated by (16), which includes asymptotically negligible terms which may nevertheless be large in real world samples.

4 Empirical illustration

This section provides an example of a dyadic regression analysis using the dataset constructed by João Santos Silva and Silvana Tenreyro (2006) in their widely-cited paper “The Log of Gravity”. This dataset, which as of the Fall of 2019 was available for download at http://personal.lse.ac.uk/tenreyro/LGW.html, includes information on countries, corresponding to 18,360 directed trading relationships. Here I present a simple specification which includes only the log of exporter and importer GDP, respectively and , as well as the log distance () between the two trading countries. Maximizing (7) yields a fitted regression function of

Standard errors which cluster on dyads, but ignore dependence across dyads sharing a single agent in common, are reported in parentheses below the coefficient estimates. Specifically these standard errors coincide with square roots of the diagonal elements of

(18)

The coefficient estimates and reported standard errors are unremarkable in the context of the empirical trade literature. I refer the reader to Santos Silva & Tenreyro (2006) or Head & Mayer (2014) for additional context.

If, instead, the Fafchamps & Gubert (2007) dyadic robust variance-covariance estimator is used to construct standard errors (see (16) earlier), I get

Standard errors which account for dependence across dyads sharing an agent in common are approximately twice those which ignore such dependence.

Monte Carlo experiment

Next I report on a small Monte Carlo experiment to illustrate the properties of inference methods based on the different variance-covariance estimates described above. I set and generate outcome data for all ordered pairs of agents according to the outcome model:

Here , for , is a sequence of i.i.d. log normal random variables, each with mean 1 and scale parameter ; for with is also sequence of i.i.d. log normal random variables, each with mean 1 and scale parameter .

Each agent is uniformly at random assigned a location on the unit square, (), equals the distance between agents and on that square; is a standard uniform random variable. I set , and . I set and . This generates moderate, but meaningful, dependence across any two dyads sharing at least one agent in common.

i.i.d. dyadic clustered
0.789 0.950
0.520 0.942
0.556 0.941

Notes: Actual coverage of nominal 0.95 confidence intervals. The data generating process is as described in the text. Coverage estimates are based upon 1,000 simulations. Intervals are Wald-type; constructed by taking the coefficient point estimate and adding and subtracting 1.96 times a standard error estimate. For the the “i.i.d.” column this standard error is based upon the assumption of independence across dyads (see equation (18)). In the “dyadic clustered” column standard errors which account for dependence across pairs of dyads sharing an agent in common are used (see equation (16)).

Table 1: Coverage of different confidence intervals with dyadic data

Table 1 reports Monte Carlo estimates of confidence interval coverage (the nominal coverage of the intervals should be 0.95). These estimates are based upon 1,000 simulated datasets. The coverage properties of two intervals are evaluated. The first is a Wald-based interval which uses standard errors constructed from (18). This corresponds to assuming independence across dyads or “clustering on dyads”. Confidence intervals constructed in this way are routinely reported in, for example, the trade literature. The coverage of these intervals is presented in first column of Table 1. The second interval is based on the Fafchamps-Gubert variance estimate (see (16) above). The coverage of these intervals, which do take into account dependence across pairs of dyads sharing an agent in common, are reported in column two of the table.

In the experiment, the intervals which do not appropriately account fo dyadic clustering, drastically undercover the truth, whereas those based on the variance estimator outline above have actual coverage very close to 0.95. While there is no doubt additional work to be done on variance estimation and inference in the dyadic context, a preliminary suggestion is to report standard errors and confidence intervals based upon equation (16) of the previous section. These intervals perform well in the simulation experiment, while those which ignore dyadic dependence, are not recommended.

5 Further reading

Although the use of gravity models by economists dates back to Tinbergen (1962), discussions of how to account for cross dyad dependence when conducting inference have been rare. Kolaczyk (2009, Chapter 7)

, in his widely cited monograph on network statistics, discusses logistic regression with dyadic data. He notes that standard inference procedures are inappropriate due to the presence of dyadic dependence, but is unable to offer a solution due to the lack of formal results in the literature (available at that time).

Fafchamps & Gubert (2007) proposed a variance-covariance estimator which allows for dyadic-dependence. Their estimator coincides with the bias-corrected one discussed in Graham (TBD) and is the one recommended here. Additional versions (and analyses) of this estimator are provided by Cameron & Miller (2014) and Aronow et al. (2017). A special case of the Fafchamps & Gubert (2007) variance estimator actually appears in Holland & Leinhardt (1976) in the context of an analysis of subgraph estimation. Snijders & Borgatti (1999) suggested using the Jackknife for variance estimation of network statistics. Results in, for example, Callaert & Veraverbeke (1981) and the references therein, suggest that this estimate is (almost) numerically equivalent to defined above.

Aldous’ (1981) representation result evidently inspired some work on LLNs and CLTs for so called dissociated random variables and exchangeable random arrays (e.g., Eagleson & Weber, 1978). The influence of this work on empirical practice appears to have been minimal. Bickel et al. (2011), evidently inspired by the variance calculations of Picard et al. (2008), but perhaps more accurately picking up where Holland & Leinhardt (1976) stopped (albeit inadvertently), present asymptotic normality results for subgraph counts. Network density, which corresponds to the mean when is binary, is the simplest example they consider and also prototypical for understanding regression. The limit theory sketched hear was novel at the time of drafting, but substantially related results – independently derived – appear in Menzel (2017) and Davezies et al. (2019). Both of these papers also present bootstrap procedures appropriate for network data. The Menzel (2017) paper focuses on the important problem of graphon degeneracy. This occurs when the graphon only weakly varies in and ; degeneracy effects rates of convergence and limit distributions. Graham et al. (2019)

present results on kernel density estimation with dyadic data.

Tabord-Meehan (2018)

showed asymptotic normality of dyadic linear regression coefficients using a rather different approach.

Appendix A Derivations

Expression (11) of the main text is an implication of calculations like

The second equality immediately above follows because , the third by independence of and conditional on , and the fourth by iterated expectations.

References

  • Aldous (1981) Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables.

    Journal of Multivariate Analysis

    , 11(4), 581 – 598.
  • Apicella et al. (2012) Apicella, C. L., Marlowe, F. W., Fowler, J. H., & Christakis, N. A. (2012). Social networks and cooperation in hunter-gatherers. Nature, 481(7382), 497 – 501.
  • Aronow et al. (2017) Aronow, P. M., Samii, C., & Assenova, V. A. (2017). Cluster-robust variance estimation for dyadic data. Political Analysis, 23(4), 564 – 577.
  • Baldwin & Taglioni (2007) Baldwin, R. & Taglioni, D. (2007). Trade effects of the euro: a comparison of estimators. Journal of Economic Integration, 22(4), 780 – 818.
  • Bellio & Varin (2005) Bellio, R. & Varin, C. (2005). A pairwise likelihood approach to generalized linear models with crossed random effects. Statistical Modelling, 5(3), 217 – 227.
  • Bickel & Chen (2009) Bickel, P. J. & Chen, A. (2009). A nonparametric view of network models and newman-girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50), 21068 – 21073.
  • Bickel et al. (2011) Bickel, P. J., Chen, A., & Levina, E. (2011).

    The method of moments and degree distributions for network models.

    Annals of Statistics, 39(5), 2280 – 2301.
  • Callaert & Veraverbeke (1981) Callaert, H. & Veraverbeke, N. (1981). The order of the normal approximation for a studentized u-statistic. Annals of Statistics, 9(1), 194 – 200.
  • Cameron & Miller (2014) Cameron, A. C. & Miller, D. L. (2014). Robust inference for dyadic data. Technical report, University of California - Davis.
  • Cattaneo et al. (2014) Cattaneo, M., Crump, R., & Jansson, M. (2014). Small bandwidth asymptotics for density-weighted average derivatives. Econometric Theory, 30(1), 176 – 200.
  • Chamberlain (1987) Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics, 34(3), 305 – 334.
  • Chandrasekhar (2015) Chandrasekhar, A. (2015). Econometrics of network formation. In Y. Bramoullé, A. Galeotti, & B. Rogers (Eds.), Oxford Handbook on the Economics of Networks. Oxford University Press.
  • Cox & Reid (2004) Cox, D. R. & Reid, N. (2004). A note on pseudolikelihood constructed from marginal densities. Biometrika, 91(3), 729 – 737.
  • Crane & Towsner (2018) Crane, H. & Towsner, H. (2018). Relatively exchangeable structures. Journal of Symbolic Logic, 83(2), 416 – 442.
  • Davezies et al. (2019) Davezies, L., d’Haultfoeuille, X., & Guyonvarch, Y. (2019). Empirical process results for exchangeable arrayes. Technical report, CREST-ENSAE.
  • de Finetti (1931) de Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio. Atti della R. Academia Nazionale dei Lincei, Serie 6. Memorie, Classe di Scienze Fisiche, Mathematice e Naturale, 4, 251 – 299.
  • Diaconis & Janson (2008) Diaconis, P. & Janson, S. (2008). Graph limits and exchangeable random graphs. Rendiconti di Matematica, 28(1), 33 – 61.
  • Eagleson & Weber (1978) Eagleson, G. K. & Weber, N. C. (1978). Limit theorems for weakly exchangeable arrays. Mathematical Proceedings of the Cambridge Philosophical Society, 84(1), 123 – 130.
  • Fafchamps & Gubert (2007) Fafchamps, M. & Gubert, F. (2007). The formation of risk sharing networks. Journal of Development Economics, 83(2), 326 – 350.
  • Ferguson (2005) Ferguson, T. S. (2005). U-statistics. University of California - Los Angeles.
  • Gourieroux et al. (1984) Gourieroux, C., Monfort, A., & Trognon, A. (1984). Pseudo maximum likelihood methods: applications to poisson models. Econometrica, 52(3), 701 – 720.
  • Graham (2017) Graham, B. S. (2017). An econometric model of network formation with degree heterogeneity. Econometrica, 85(4), 1033 – 1063.
  • Graham (TBD) Graham, B. S. (TBD). Handbook of Econometrics, volume 7, chapter The econometric analysis of networks. North-Holland: Amsterdam.
  • Graham et al. (2014) Graham, B. S., Imbens, G. W., & Ridder, G. (2014). Complementarity and aggregate implications of assortative matching: a nonparametric analysis. Quantitative Economics, 5(1), 29 – 66.
  • Graham et al. (2019) Graham, B. S., Niu, F., & Powell, J. L. (2019). Kernel density estimation for undirected dyadic data. Technical report, University of California - Berkeley.
  • Head & Mayer (2014) Head, K. & Mayer, T. (2014). Handbook of International Economics, volume 4, chapter Gravity equations: workhorse, toolkit, and cookbook, (pp. 131 – 191). North-Holland: Amsterdam.
  • Hoeffding (1948) Hoeffding, W. (1948).

    A class of statistics with asymptotically normal distribution.

    Annals of Mathematical Statistics, 19(3), 293 – 325.
  • Holland & Leinhardt (1976) Holland, P. W. & Leinhardt, S. (1976). Local structure in social networks. Sociological Methodology, 7, 1 – 45.
  • Hoover (1979) Hoover, D. N. (1979). Relations on probability spaces and arrays of random variables. Technical report, Institute for Advanced Study, Princeton, NJ.
  • Huber (1967) Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 221 – 233.
  • Kolaczyk (2009) Kolaczyk, E. D. (2009). Statistical Analysis of Network Data. New York: Springer.
  • Lindsey (1988) Lindsey, B. G. (1988). Composite likelihood. Contemporary Mathematics, 80, 221 – 239.
  • Menzel (2017) Menzel, K. (2017). Bootstrap with clustering in two or more dimensions. Technical Report 1703.03043v2, arXiv.
  • Oneal & Russett (1999) Oneal, J. R. & Russett, B. (1999). The kantian peace: the pacific benefits of democracy, interdependence, and international organizations. World Politics, 52(1), 1 – 37.
  • Picard et al. (2008) Picard, F., Daudin, J. J., Koskas, M., Schbath, S., & Robin, S. (2008). Assessing the exceptionality of network motifs. Journal of Computational Biology, 15(1), 1 – 20.
  • Portes & Rey (2005) Portes, R. & Rey, H. (2005). The determinants of cross-border equity flows. Journal of International Economics, 65(2), 269 – 296.
  • Rose (2004) Rose, A. K. (2004). Do we really know that the wto increases trade? American Economic Review, 94(1), 98 – 114.
  • Santos Silva & Tenreyro (2006) Santos Silva, J. & Tenreyro, S. (2006). The log of gravity. Review of Economics and Statistics, 88(4), 641 – 658.
  • Santos Silva & Tenreyro (2010) Santos Silva, J. & Tenreyro, S. (2010). Currency unions in prospect and retrospect. Annual Review of Economics, 2, 51 – 74.
  • Shalizi (2016) Shalizi, C. R. (2016). Lecture 1: Conditionally-independent dyad models. Lecture note, Carnegie Mellon University.
  • Snijders & Borgatti (1999) Snijders, T. A. B. & Borgatti, S. P. (1999). Non-parametric standard errors and tests for network statistics. Connections, 22(2), 61 – 70.
  • Tabord-Meehan (2018) Tabord-Meehan, M. (2018). Inference with dyadic data: Asymptotic behavior of the dyadic-robust t-statistic. Journal of Business and Economic Statistics.
  • Tinbergen (1962) Tinbergen, J. (1962). Shaping the World Economy: Suggestions for an International Economic Policy. New York: Twentieth Century Fund.
  • van der Vaart (2000) van der Vaart, A. W. (2000). Asymptotic Statistics. Cambridge: Cambridge University Press.