Many important social and economic variables are naturally defined for pairs of agents (or dyads). Examples include trade between pairs of countries (e.g., Tinbergen, 1962), input purchases and sales between pairs of firms (e.g., Atalay et al., 2011), research and development (R&D) partnerships across firms (e.g., König et al., 2019) and friendships between individuals (e.g., Christakis et al., 2010). Dyadic data arises frequently in the analysis of social and economic networks. In economics such analyses are predominant in, for example, the analysis of international trade flows. See Graham (TBD) for many other examples and references.
While the statistical analysis of network data began almost a century ago, rigorously justified methods of inference for network statistics are only now emerging (cf., Goldenberg et al., 2009). In this paper we study nonparametric estimation of the density function of a (continuously-valued) dyadic random variable. Examples included the density of migration across states, trade across nations, liabilities across banks, or minutes of telephone conversation among individuals. While nonparametric density estimation using independent and identically distributed random samples, henceforth “monadic” data, is well-understood, its dyadic counterpart has, to our knowledge, not yet been studied.
Holland & Leinhardt (1976)
derived the sampling variance of the link frequency in a simple network (and of other low order subgraph counts). A general asymptotic distribution theory for subgraph counts, exploiting recent ideas from the probability literature on dense graph limits(e.g., Diaconis & Janson, 2008; Lovász, 2012), was presented in Bickel et al. (2011).222See Nowicki (1991) for a summary of earlier research in this area. Menzel (2017) presents bootstrap procedures for inference on the mean of a dyadic random variable. Our focus on nonparametric density estimation appears to be novel. Density estimation is, of course, a topic of intrinsic interest to econometricians and statisticians, but it also provides a relatively simple and canonical starting point for understanding nonparametric estimation more generally. In the conclusion of this paper we discuss ongoing work on other non- and semi-parametric estimation problems using dyadic data.
We show that an (obvious) adaptation of the Rosenblatt (1956) and Parzen (1962) kernel density estimator is applicable to dyadic data. While our dyadic density estimator is straightforward to define, its rate-of-convergence and asymptotic sampling properties, depart significantly from its monadic counterpart. Let be the number of sampled agents and the corresponding number of dyads. Estimation is based upon the dyadic outcomes. Due to dependence across dyads sharing an agent in common, the rate of convergence of our density estimate is (generally) much slower than it would be with i.i.d. outcomes. This rate-of-convergence is also invariant across a wide range of bandwidth sequences. This property is familiar from the econometric literature on semiparametric estimation (e.g., Powell, 1994). Indeed, from a certain perspective, our nonparametric dyadic density estimate can be viewed as a semiparametric estimator (in the sense that it can be thought of as an average of nonparametrically estimated densities). We also explore the impact of “degeneracy” – which arises when dependence across dyads vanishes – on our sampling theory; such degeneracy features prominently in Menzel’s (2017) innovative analysis of inference on dyadic means. We expect that many of our findings generalize to other non- and semi-parametric network estimation problems.
In the next section we present our maintained data/network generating process and proposed kernel density estimator. Section 3 explores the mean square error properties of this estimator, while Section 4 outlines asymptotic distribution theory. Section 5
presents a consistent variance estimator, which can be used to construct Wald statistics and Wald-based confidence intervals. We summarize the results of a small simulation study in Section6. In Section 7 we discuss various extensions and ongoing work. Calculations not presented in the main text are collected in Appendix A.
It what follows we interchangeably use unit, node, vertex, agent and individual all to refer to the vertices of the sampled network or graph. We denote random variables by capital Roman letters, specific realizations by lower case Roman letters and their support by blackboard bold Roman letters. That is , and respectively denote a generic random draw of, a specific value of, and the support of, . For a dyadic outcome, or weighted edge, associated with agents and , we use the notation to denote the adjacency matrix of all such outcomes/edges. Additional notation is defined in the sections which follow.
2 Model and estimator
Let index a simple random sample of agents from some large (infinite) network of interest. A pair of agents constitutes a dyad. For each of the sampled dyads, that is for and , we observe the (scalar) random variable , generated according to
is a node-specific random vector of attributes (of arbitrary dimension, not necessarily observable), andis an unobservable scalar random variable which is continuously distributed on with density function .333In words we observe the weighted subgraph induced by the randomly sampled agents. Observe that the function is symmetric in its first two arguments, ensuring that is undirected.
In what follows we directly maintain (1), however, it also a consequence of assuming that the infinite graph sampled from is jointly exchangeable (Aldous, 1981; Hoover, 1979). Joint exchangeability of the sampled graph implies that
for every where
is a permutation of the node indices. Put differently, when node labels
have no meaning we have that the “likelihood” of any simultaneous
row and column permutation of is the same as that of
itself.444For the weighted adjacency
matrix and any conformable permutation matrix
Our target object of estimation is the marginal density function of
, defined as the derivative of the cumulative distribution function (c.d.f.) of
To ensure this density function is well-defined on the support of we assume that the unknown function is strictly increasing and continuously differentiable in its third argument , and we also assume that and are statistically independent of the “error term” for all and Under these assumptions, by the usual change-of-variables formula, the conditional density of given and takes the form
In the derivations below we will assume this density function is bounded and twice continuously differentiable at with bounded second derivative for all and ; this will follow from the similar smoothness conditions imposed on the primitives and
To derive the marginal density of note that, by random sampling, the sequence is independently and identically distributed (i.i.d.), as is the sequence. Under these conditions, we can define the conditional densities of given or alone as
and, averaging, the marginal density of interest as
Let and index distinct agents. The assumption that and are i.i.d. implies that while varies independently of (since the and dyads share no agents in common), will not vary independently of as both vary with (since the and dyads both include agent ). This type of dependence structure is sometimes referred to as “dyadic clustering” in empirical social science research (cf., Fafchamps & Gubert, 2007; Cameron & Miller, 2014; Aronow et al., 2017). The implications of this dependence structure for density estimation and – especially – inference is a key area of focus in what follows.
Given this construction of the marginal density of it can be estimated using an immediate extension of the kernel density estimator for monadic data first proposed by Rosenblatt (1956) and Parzen (1962):
Here is a kernel function assumed to be (i) bounded ( for all ), (ii) symmetric (), (ii) , and zero outside a bounded interval ( if ); we also require that it (iv) integrates to one (). The bandwidth is assumed to be a positive, deterministic sequence (indexed by the number of nodes ) that tends to zero as and will satisfy other conditions imposed below. A discussion of the motivation for the kernel estimator and its statistical properties under random sampling (of monadic variables) can be found in Silverman (1986, Chapers 2 & 3).
3 Rate of convergence analysis
To formulate conditions for consistency of we will evaluate its expectation and variance, which will yield conditions on the bandwidth sequence for its mean squared error to converge to zero.
A standard calculation yields a bias of equal to (see Appendix A)
Equation (3) coincides with the bias of the kernel density estimate based upon a random (“monadic”) sample.
The expression for the variance of , in contrast to that for bias, does differ from the monadic (i.i.d.) case due to the (possibly) nonzero covariance between and for :
The third line of this expression uses the fact that, in the summation in the second line, there are terms with and terms with one subscript in common; as noted earlier, when and have no subscripts in common they are independent (and thus uncorrelated).
To calculate the dependence of this variance on the number of nodes we analyze and Beginning with the former,
Like the expected value, this own variance term is of the same order of magnitude as in the monadic case,
However, the covariance term which would be absent for i.i.d. monadic data, is generally nonzero. Since
(where the second line uses the change of variables and and mutual independence of and ). It follows that
Provided that and the bandwidth sequence is chosen such that
as we get that
and hence that
In fact, the rate of convergence of to will be as long as for some as although the mean-squared error will include an additional bias or variance term of if either or does not diverge to infinity.
To derive the MSE-optimal bandwidth sequence we minimize (5) with respect to its first and third terms, this yields an optimal bandwidth sequence of
This sequence satisfies condition (6) above.
Interestingly, the rate of convergence of to under condition (6) is the same as the rate of convergence of the sample mean
provided is non-degenerate, yielding
Thus, in contrast to the case of i.i.d monadic data, there is no convergence-rate “cost” associated with nonparametric estimation of The presence of dyadic dependence, due to its impact on estimation variance, does slow down the feasible rate of convergence substantially. With iid data the relevant rate for density estimation would be when the MSE-optimal bandwidth sequence is used. Recalling that , the rate we find here corresponds to an rate. The slowdown from to captures the rate of convergence costs of dyadic dependence on the variance of our density estimate.
The lack of dependence of the convergence rate of to on the precise bandwidth sequence chosen is analogous to that for semiparametric estimators defined as averages over nonparametrically-estimated components (e.g., Newey, 1994; Powell, 1994). Defining the estimator can be expressed as
Holding fixed, the estimator can be shown to converge to at the nonparametric rate but the average of this nonparametric estimator over converges at the faster (“parametric”) rate In comparison, while
the latter converges at the parametric rate and the additional averaging to obtain does not improve upon that rate.
4 Asymptotic distribution theory
To derive conditions under which is approximately normally distributed it is helpful to decompose the difference between and into four terms:
To understand this decomposition observe that the projection of onto equals, by the independence assumptions imposed on and the U-statistic . This U-Statistic is defined in terms of the latent i.i.d. random variables .
The first term in this expression, line (9), is minus the projection/U-Statistic described above. Each term in this summation has conditional expectation zero given the remaining terms (i.e., the terms form a martingale difference sequence).
The second term in the decomposition, line (10), is the difference between the second-order U-statistic and its Hájek projection (e.g., van der Vaart, 2000)555That is the projection of onto the linear subspace consisting of all functions of the form ., the third term, line (11), is a centered version of that Hájek projection, and the final term, line (12), is the bias of A similar “double projection” argument was used by Graham (2017)
to analyze the large sample properties of the Tetrad Logit estimator.
If the bandwidth sequence satisfies the conditions and the calculations in the previous section can be used to show that the first, second, and fourth terms of this decomposition (i.e., and ) will all converge to zero when normalized by . In this case, , which is an average of i.i.d. random variables, will be the leading term asymptotically such that
If, however, the bandwidth sequence has (a “knife-edge” undersmoothing condition similar to one considered by Cattaneo et al. (2014) in a different context), then both and will be asymptotically normal when normalized by To accommodate both of these cases in a single result, we will show that a standardized version of the sum will have a standard normal limit distribution, although the first, , term may be degenerate in the limit.
In Appendix A we show that both and will be asymptotically negligible when normalized by the convergence rate of such that the asymptotic distribution of will only depend on the and terms.
We start by rewriting the sum of terms and as
and the triangular array is defined as
That is, is the collection of terms of the form
for (with ) and
for and Using the independence assumptions on and , as well as iterated expectations, it is tedious but straightforward to verify that
that is, is a martingale difference sequence (MDS).
Defining the variance of this MDS as
we can demonstrate asymptotic normality of its standardized sum –
– by a central limit theorem for martingale difference triangular arrays (see, for example,Hall & Heyde (1980), Theorem 3.2 and Corollary 3.1 and White (2001), Theorem 5.24 and Corollary 5.26). Specifically, if the Lyapunov condition
holds for some and also the stability condition
From the calculations used in the MSE analysis of Section 3 we have that
so, taking ,
assuming and In the degenerate case, where , we will still have as long as the “knife-edge” undersmoothing bandwidth sequence is chosen.
To verify the Lyapunov condition (13), note that
Putting things together we get that
when for all Therefore the Lyapunov condition (13) is satisfied for since
To verify the stability condition (14), we first rewrite that condition as
Since the stability condition (LABEL:eq:_mixing_2) will hold if and both converge to zero in probability.