## 1 Introduction

Many important social and economic variables are naturally defined for pairs of agents (or dyads). Examples include trade between pairs of countries (e.g., Tinbergen, 1962), input purchases and sales between pairs of firms (e.g., Atalay et al., 2011), research and development (R&D) partnerships across firms (e.g., König et al., 2019) and friendships between individuals (e.g., Christakis et al., 2010). Dyadic data arises frequently in the analysis of social and economic networks. In economics such analyses are predominant in, for example, the analysis of international trade flows. See Graham (TBD) for many other examples and references.

While the statistical analysis of network data began almost a century ago, rigorously justified methods of inference for network statistics are only now emerging (cf., Goldenberg et al., 2009). In this paper we study nonparametric estimation of the density function of a (continuously-valued) dyadic random variable. Examples included the density of migration across states, trade across nations, liabilities across banks, or minutes of telephone conversation among individuals. While nonparametric density estimation using independent and identically distributed random samples, henceforth “monadic” data, is well-understood, its dyadic counterpart has, to our knowledge, not yet been studied.

Holland & Leinhardt (1976)

derived the sampling variance of the link frequency in a simple network (and of other low order subgraph counts). A general asymptotic distribution theory for subgraph counts, exploiting recent ideas from the probability literature on dense graph limits

(e.g., Diaconis & Janson, 2008; Lovász, 2012), was presented in Bickel et al. (2011).^{2}

^{2}2See Nowicki (1991) for a summary of earlier research in this area. Menzel (2017) presents bootstrap procedures for inference on the mean of a dyadic random variable. Our focus on nonparametric density estimation appears to be novel. Density estimation is, of course, a topic of intrinsic interest to econometricians and statisticians, but it also provides a relatively simple and canonical starting point for understanding nonparametric estimation more generally. In the conclusion of this paper we discuss ongoing work on other non- and semi-parametric estimation problems using dyadic data.

We show that an (obvious) adaptation of the Rosenblatt (1956)
and Parzen (1962) kernel density estimator is applicable to
dyadic data. While our dyadic density estimator is straightforward
to define, its rate-of-convergence and asymptotic sampling properties,
depart significantly from its monadic counterpart. Let be the
number of sampled agents and the corresponding
number of dyads. Estimation is based upon the dyadic outcomes.
Due to dependence across dyads sharing an agent in common, the rate
of convergence of our density estimate is (generally) much *slower*
than it would be with i.i.d. outcomes. This rate-of-convergence
is also invariant across a wide range of bandwidth sequences. This
property is familiar from the econometric literature on semiparametric
estimation (e.g., Powell, 1994). Indeed, from a certain
perspective, our nonparametric dyadic density estimate can be viewed
as a semiparametric estimator (in the sense that it can be thought
of as an average of nonparametrically estimated densities). We also
explore the impact of “degeneracy” – which arises when dependence
across dyads vanishes – on our sampling theory; such degeneracy
features prominently in Menzel’s (2017) innovative
analysis of inference on dyadic means. We expect that many of our
findings generalize to other non- and semi-parametric network estimation
problems.

In the next section we present our maintained data/network generating process and proposed kernel density estimator. Section 3 explores the mean square error properties of this estimator, while Section 4 outlines asymptotic distribution theory. Section 5

presents a consistent variance estimator, which can be used to construct Wald statistics and Wald-based confidence intervals. We summarize the results of a small simulation study in Section

6. In Section 7 we discuss various extensions and ongoing work. Calculations not presented in the main text are collected in Appendix A.It what follows we interchangeably use unit, node, vertex, agent and individual all to refer to the vertices of the sampled network or graph. We denote random variables by capital Roman letters, specific realizations by lower case Roman letters and their support by blackboard bold Roman letters. That is , and respectively denote a generic random draw of, a specific value of, and the support of, . For a dyadic outcome, or weighted edge, associated with agents and , we use the notation to denote the adjacency matrix of all such outcomes/edges. Additional notation is defined in the sections which follow.

## 2 Model and estimator

### Model

Let index a simple random sample of agents from
some large (infinite) network of interest. A pair of agents constitutes
a *dyad*. For each of the sampled dyads, that
is for and , we observe the (scalar)
random variable , generated according to

(1) |

where

is a node-specific random vector of attributes (of arbitrary dimension, not necessarily observable), and

is an unobservable scalar random variable which is continuously distributed on with density function .^{3}

^{3}3In words we observe the weighted subgraph induced by the randomly sampled agents. Observe that the function is symmetric in its first two arguments, ensuring that is undirected.

In what follows we directly maintain (1), however, it also a consequence of assuming that the infinite graph sampled from is jointly exchangeable (Aldous, 1981; Hoover, 1979). Joint exchangeability of the sampled graph implies that

(2) |

for every where
is a permutation of the node indices. Put differently, when node labels
have no meaning we have that the “likelihood” of any simultaneous
row and column permutation of is the same as that of
itself.^{4}^{4}4For the weighted adjacency
matrix and any conformable permutation matrix

Our target object of estimation is the marginal density function of

, defined as the derivative of the cumulative distribution function (c.d.f.) of

To ensure this density function is well-defined on the support of we assume that the unknown function is strictly increasing and continuously differentiable in its third argument , and we also assume that and are statistically independent of the “error term” for all and Under these assumptions, by the usual change-of-variables formula, the conditional density of given and takes the form

In the derivations below we will assume this density function is bounded and twice continuously differentiable at with bounded second derivative for all and ; this will follow from the similar smoothness conditions imposed on the primitives and

To derive the marginal density of note that, by random sampling, the sequence is independently and identically distributed (i.i.d.), as is the sequence. Under these conditions, we can define the conditional densities of given or alone as

and, averaging, the marginal density of interest as

Let and index distinct agents. The assumption that and are i.i.d. implies that while varies independently of (since the and dyads share no agents in common), will not vary independently of as both vary with (since the and dyads both include agent ). This type of dependence structure is sometimes referred to as “dyadic clustering” in empirical social science research (cf., Fafchamps & Gubert, 2007; Cameron & Miller, 2014; Aronow et al., 2017). The implications of this dependence structure for density estimation and – especially – inference is a key area of focus in what follows.

### Estimator

Given this construction of the marginal density of it can be estimated using an immediate extension of the kernel density estimator for monadic data first proposed by Rosenblatt (1956) and Parzen (1962):

where

Here is a kernel function assumed to be (i) bounded ( for all ), (ii) symmetric (), (ii) , and zero outside a bounded interval ( if ); we also require that it (iv) integrates to one (). The bandwidth is assumed to be a positive, deterministic sequence (indexed by the number of nodes ) that tends to zero as and will satisfy other conditions imposed below. A discussion of the motivation for the kernel estimator and its statistical properties under random sampling (of monadic variables) can be found in Silverman (1986, Chapers 2 & 3).

## 3 Rate of convergence analysis

To formulate conditions for consistency of we will evaluate its expectation and variance, which will yield conditions on the bandwidth sequence for its mean squared error to converge to zero.

A standard calculation yields a bias of equal to (see Appendix A)

(3) | ||||

with

Equation (3) coincides with the bias of the kernel density estimate based upon a random (“monadic”) sample.

The expression for the variance of , in contrast to that for bias, does differ from the monadic (i.i.d.) case due to the (possibly) nonzero covariance between and for :

The third line of this expression uses the fact that, in the summation in the second line, there are terms with and terms with one subscript in common; as noted earlier, when and have no subscripts in common they are independent (and thus uncorrelated).

To calculate the dependence of this variance on the number of nodes we analyze and Beginning with the former,

where

Like the expected value, this own variance term is of the same order of magnitude as in the monadic case,

However, the covariance term which would be absent for i.i.d. monadic data, is generally nonzero. Since

(where the second line uses the change of variables and and mutual independence of and ). It follows that

with

Therefore,

(4) | ||||

and the mean-squared error of is, using (3) and (4),

(5) | ||||

Provided that and the bandwidth sequence is chosen such that

(6) |

as we get that

and hence that

In fact, the rate of convergence of to will be as long as for some as although the mean-squared error will include an additional bias or variance term of if either or does not diverge to infinity.

To derive the MSE-optimal bandwidth sequence we minimize (5) with respect to its first and third terms, this yields an optimal bandwidth sequence of

(7) | ||||

This sequence satisfies condition (6) above.

Interestingly, the rate of convergence of to under condition (6) is the same as the rate of convergence of the sample mean

(8) |

to its expectation when Similar variance calculations to those for yield (see also Holland & Leinhardt (1976) and Menzel (2017))

provided is non-degenerate, yielding

Thus, in contrast to the case of i.i.d monadic data, there is no convergence-rate “cost” associated with nonparametric estimation of The presence of dyadic dependence, due to its impact on estimation variance, does slow down the feasible rate of convergence substantially. With iid data the relevant rate for density estimation would be when the MSE-optimal bandwidth sequence is used. Recalling that , the rate we find here corresponds to an rate. The slowdown from to captures the rate of convergence costs of dyadic dependence on the variance of our density estimate.

The lack of dependence of the convergence rate of to on the precise bandwidth sequence chosen is analogous to that for semiparametric estimators defined as averages over nonparametrically-estimated components (e.g., Newey, 1994; Powell, 1994). Defining the estimator can be expressed as

where

Holding fixed, the estimator can be shown to converge to at the nonparametric rate but the average of this nonparametric estimator over converges at the faster (“parametric”) rate In comparison, while

for

the latter converges at the parametric rate and the additional averaging to obtain does not improve upon that rate.

## 4 Asymptotic distribution theory

To derive conditions under which is approximately normally distributed it is helpful to decompose the difference between and into four terms:

(9) | ||||

(10) | ||||

(11) | ||||

(12) | ||||

To understand this decomposition observe that the projection of
onto equals, by the independence assumptions
imposed on and the U-statistic .
This U-Statistic is defined in terms of the *latent *i.i.d. random
variables .

The first term in this expression, line (9), is minus the projection/U-Statistic described above. Each term in this summation has conditional expectation zero given the remaining terms (i.e., the terms form a martingale difference sequence).

The second term in the decomposition, line (10),
is the difference between the second-order U-statistic
and its Hájek projection (e.g., van der Vaart, 2000)^{5}^{5}5That is the projection of
onto the linear subspace consisting of all functions of the form ., the third term, line (11), is a centered version
of that Hájek projection, and the final term, line (12),
is the bias of A similar “double projection”
argument was used by Graham (2017)

to analyze the large sample properties of the Tetrad Logit estimator.

If the bandwidth sequence satisfies the conditions and the calculations in the previous section can be used to show that the first, second, and fourth terms of this decomposition (i.e., and ) will all converge to zero when normalized by . In this case, , which is an average of i.i.d. random variables, will be the leading term asymptotically such that

assuming .

If, however, the bandwidth sequence has (a “knife-edge” undersmoothing condition similar to one considered by Cattaneo et al. (2014) in a different context), then both and will be asymptotically normal when normalized by To accommodate both of these cases in a single result, we will show that a standardized version of the sum will have a standard normal limit distribution, although the first, , term may be degenerate in the limit.

In Appendix A we show that both and will be asymptotically negligible when normalized by the convergence rate of such that the asymptotic distribution of will only depend on the and terms.

We start by rewriting the sum of terms and as

where

and the triangular array is defined as

That is, is the collection of terms of the form

for (with ) and

for and Using the independence assumptions on and , as well as iterated expectations, it is tedious but straightforward to verify that

that is, is a martingale difference sequence (MDS).

Defining the variance of this MDS as

we can demonstrate asymptotic normality of its standardized sum –

– by a central limit theorem for martingale difference triangular arrays (see, for example,

Hall & Heyde (1980), Theorem 3.2 and Corollary 3.1 and White (2001), Theorem 5.24 and Corollary 5.26). Specifically, if the Lyapunov condition(13) |

holds for some and also the stability condition

(14) |

holds then

(15) |

From the calculations used in the MSE analysis of Section 3 we have that

so, taking ,

assuming and In the degenerate case, where , we will still have as long as the “knife-edge” undersmoothing bandwidth sequence is chosen.

To verify the Lyapunov condition (13), note that

(16) |

and

(17) |

Putting things together we get that

when for all Therefore the Lyapunov condition (13) is satisfied for since

To verify the stability condition (14), we first rewrite that condition as

where

and

Since the stability condition (LABEL:eq:_mixing_2) will hold if and both converge to zero in probability.