DeepAI

# Kernel Density Estimation for Undirected Dyadic Data

We study nonparametric estimation of density functions for undirected dyadic random variables (i.e., random variables defined for all ndef≡N2 unordered pairs of agents/nodes in a weighted network of order N). These random variables satisfy a local dependence property: any random variables in the network that share one or two indices may be dependent, while those sharing no indices in common are independent. In this setting, we show that density functions may be estimated by an application of the kernel estimation method of Rosenblatt (1956) and Parzen (1962). We suggest an estimate of their asymptotic variances inspired by a combination of (i) Newey's (1994) method of variance estimation for kernel estimators in the "monadic" setting and (ii) a variance estimator for the (estimated) density of a simple network first suggested by Holland and Leinhardt (1976). More unusual are the rates of convergence and asymptotic (normal) distributions of our dyadic density estimates. Specifically, we show that they converge at the same rate as the (unconditional) dyadic sample mean: the square root of the number, N, of nodes. This differs from the results for nonparametric estimation of densities and regression functions for monadic data, which generally have a slower rate of convergence than their corresponding sample mean.

• 7 publications
• 4 publications
• 3 publications
10/19/2016

### Consistent Kernel Mean Estimation for Functions of Random Variables

We provide a theoretical foundation for non-parametric estimation of fun...
12/17/2019

### Nonparametric density estimation for intentionally corrupted functional data

We consider statistical models where functional data are artificially co...
01/28/2019

### Nonparametric relative error estimation of the regression function for censored data

Let (T_i)_i be a sequence of independent identically distributed (i.i.d...
06/22/2019

### Density Estimation on a Network

This paper develops a novel approach to density estimation on a network....
03/19/2018

### Nonparametric forecasting of multivariate probability density functions

The study of dependence between random variables is the core of theoreti...
05/19/2017

### The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables

This paper introduces the kernel mixture network, a new method for nonpa...
03/20/2020

### Kernel density decomposition with an application to the social cost of carbon

A kernel density is an aggregate of kernel functions, which are itself d...

## 1 Introduction

Many important social and economic variables are naturally defined for pairs of agents (or dyads). Examples include trade between pairs of countries (e.g., Tinbergen, 1962), input purchases and sales between pairs of firms (e.g., Atalay et al., 2011), research and development (R&D) partnerships across firms (e.g., König et al., 2019) and friendships between individuals (e.g., Christakis et al., 2010). Dyadic data arises frequently in the analysis of social and economic networks. In economics such analyses are predominant in, for example, the analysis of international trade flows. See Graham (TBD) for many other examples and references.

While the statistical analysis of network data began almost a century ago, rigorously justified methods of inference for network statistics are only now emerging (cf., Goldenberg et al., 2009). In this paper we study nonparametric estimation of the density function of a (continuously-valued) dyadic random variable. Examples included the density of migration across states, trade across nations, liabilities across banks, or minutes of telephone conversation among individuals. While nonparametric density estimation using independent and identically distributed random samples, henceforth “monadic” data, is well-understood, its dyadic counterpart has, to our knowledge, not yet been studied.

Holland & Leinhardt (1976)

derived the sampling variance of the link frequency in a simple network (and of other low order subgraph counts). A general asymptotic distribution theory for subgraph counts, exploiting recent ideas from the probability literature on dense graph limits

(e.g., Diaconis & Janson, 2008; Lovász, 2012), was presented in Bickel et al. (2011).222See Nowicki (1991) for a summary of earlier research in this area. Menzel (2017) presents bootstrap procedures for inference on the mean of a dyadic random variable. Our focus on nonparametric density estimation appears to be novel. Density estimation is, of course, a topic of intrinsic interest to econometricians and statisticians, but it also provides a relatively simple and canonical starting point for understanding nonparametric estimation more generally. In the conclusion of this paper we discuss ongoing work on other non- and semi-parametric estimation problems using dyadic data.

We show that an (obvious) adaptation of the Rosenblatt (1956) and Parzen (1962) kernel density estimator is applicable to dyadic data. While our dyadic density estimator is straightforward to define, its rate-of-convergence and asymptotic sampling properties, depart significantly from its monadic counterpart. Let be the number of sampled agents and the corresponding number of dyads. Estimation is based upon the dyadic outcomes. Due to dependence across dyads sharing an agent in common, the rate of convergence of our density estimate is (generally) much slower than it would be with i.i.d. outcomes. This rate-of-convergence is also invariant across a wide range of bandwidth sequences. This property is familiar from the econometric literature on semiparametric estimation (e.g., Powell, 1994). Indeed, from a certain perspective, our nonparametric dyadic density estimate can be viewed as a semiparametric estimator (in the sense that it can be thought of as an average of nonparametrically estimated densities). We also explore the impact of “degeneracy” – which arises when dependence across dyads vanishes – on our sampling theory; such degeneracy features prominently in Menzel’s (2017) innovative analysis of inference on dyadic means. We expect that many of our findings generalize to other non- and semi-parametric network estimation problems.

In the next section we present our maintained data/network generating process and proposed kernel density estimator. Section 3 explores the mean square error properties of this estimator, while Section 4 outlines asymptotic distribution theory. Section 5

presents a consistent variance estimator, which can be used to construct Wald statistics and Wald-based confidence intervals. We summarize the results of a small simulation study in Section

6. In Section 7 we discuss various extensions and ongoing work. Calculations not presented in the main text are collected in Appendix A.

It what follows we interchangeably use unit, node, vertex, agent and individual all to refer to the vertices of the sampled network or graph. We denote random variables by capital Roman letters, specific realizations by lower case Roman letters and their support by blackboard bold Roman letters. That is , and respectively denote a generic random draw of, a specific value of, and the support of, . For a dyadic outcome, or weighted edge, associated with agents and , we use the notation to denote the adjacency matrix of all such outcomes/edges. Additional notation is defined in the sections which follow.

## 2 Model and estimator

### Model

Let index a simple random sample of agents from some large (infinite) network of interest. A pair of agents constitutes a dyad. For each of the sampled dyads, that is for and , we observe the (scalar) random variable , generated according to

 Wij=W(Ai,Aj,Vij)=W(Aj,Ai,Vij), (1)

where

is a node-specific random vector of attributes (of arbitrary dimension, not necessarily observable), and

is an unobservable scalar random variable which is continuously distributed on with density function .333In words we observe the weighted subgraph induced by the randomly sampled agents. Observe that the function is symmetric in its first two arguments, ensuring that is undirected.

In what follows we directly maintain (1), however, it also a consequence of assuming that the infinite graph sampled from is jointly exchangeable (Aldous, 1981; Hoover, 1979). Joint exchangeability of the sampled graph implies that

 [Wij]D=[Wπ(i)π(j)] (2)

for every where is a permutation of the node indices. Put differently, when node labels have no meaning we have that the “likelihood” of any simultaneous row and column permutation of is the same as that of itself.444For the weighted adjacency matrix and any conformable permutation matrix

for all See Menzel (2017) for a related discussion.

Our target object of estimation is the marginal density function of

, defined as the derivative of the cumulative distribution function (c.d.f.) of

 Pr{Wij≤w}def≡FW(w)=∫w−∞fW(u)du.

To ensure this density function is well-defined on the support of we assume that the unknown function is strictly increasing and continuously differentiable in its third argument , and we also assume that and are statistically independent of the “error term” for all and Under these assumptions, by the usual change-of-variables formula, the conditional density of given and takes the form

 fY|AA(w|a1,a2)=fV(W−1(a1,a2,w))⋅∣∣∣∂W(a1,a2,W−1(a1,a2,w))∂v∣∣∣−1.

In the derivations below we will assume this density function is bounded and twice continuously differentiable at with bounded second derivative for all and ; this will follow from the similar smoothness conditions imposed on the primitives and

To derive the marginal density of note that, by random sampling, the sequence is independently and identically distributed (i.i.d.), as is the sequence. Under these conditions, we can define the conditional densities of given or alone as

 fW|A(w|a)≡E[fW|AA(w|a,Aj)]=E[fW|AA(w|Ai,a)],

and, averaging, the marginal density of interest as

 fW(w)def≡E[fW|AA(w|Ai,Aj)]=E[fW|A(w|Ai)].

Let and index distinct agents. The assumption that and are i.i.d. implies that while varies independently of (since the and dyads share no agents in common), will not vary independently of as both vary with (since the and dyads both include agent ). This type of dependence structure is sometimes referred to as “dyadic clustering” in empirical social science research (cf., Fafchamps & Gubert, 2007; Cameron & Miller, 2014; Aronow et al., 2017). The implications of this dependence structure for density estimation and – especially – inference is a key area of focus in what follows.

### Estimator

Given this construction of the marginal density of it can be estimated using an immediate extension of the kernel density estimator for monadic data first proposed by Rosenblatt (1956) and Parzen (1962):

 ^fW(w) =(N2)−1N−1∑i=1N∑j=1+11hK(w−Wijh) def≡1n∑i

where

 Kijdef≡1hK(w−Wijh).

Here is a kernel function assumed to be (i) bounded ( for all ), (ii) symmetric (), (ii) , and zero outside a bounded interval ( if ); we also require that it (iv) integrates to one (). The bandwidth is assumed to be a positive, deterministic sequence (indexed by the number of nodes ) that tends to zero as and will satisfy other conditions imposed below. A discussion of the motivation for the kernel estimator and its statistical properties under random sampling (of monadic variables) can be found in Silverman (1986, Chapers 2 & 3).

## 3 Rate of convergence analysis

To formulate conditions for consistency of we will evaluate its expectation and variance, which will yield conditions on the bandwidth sequence for its mean squared error to converge to zero.

A standard calculation yields a bias of equal to (see Appendix A)

 E[^fW(w)]−fW(w) =h2B(w)+o(h2) (3) =O(h2N),

with

 B(w)def≡12∂2fW(w)∂w2∫u2K(u)du.

Equation (3) coincides with the bias of the kernel density estimate based upon a random (“monadic”) sample.

The expression for the variance of , in contrast to that for bias, does differ from the monadic (i.i.d.) case due to the (possibly) nonzero covariance between and for :

 V(^fW(w)) =V(1n∑i

The third line of this expression uses the fact that, in the summation in the second line, there are terms with and terms with one subscript in common; as noted earlier, when and have no subscripts in common they are independent (and thus uncorrelated).

To calculate the dependence of this variance on the number of nodes we analyze and Beginning with the former,

 V(K12) =E[(K12)2]−(E[^fW(w)])2 =1h2∫[K(w−sh)]2fW(s)ds+O(1) =1h∫[K(u)]2fW(w−hu)du+O(1) =fW(w)h⋅∫[K(u)]2du+O(1) def≡1hNΩ2(w)+O(1),

where

 Ω2(w)def≡fW(w)⋅∫[K(u)]2du.

Like the expected value, this own variance term is of the same order of magnitude as in the monadic case,

 V(K12)=O(1h).

However, the covariance term which would be absent for i.i.d. monadic data, is generally nonzero. Since

 E[Kij⋅Kik] =E[∫∫1h2[K(w−s1h)]⋅[K(w−s2h)] ⋅fW|AA(s1|A1,A2)fW|AA(s2|A1,A3)ds1ds2] =E[∫[K(u1)]fW|A(w−hu1|A1)du1 ⋅∫[K(u2)]fW|A(w−hu2|A1)du2], =E[fW|A(w|A1)2]+o(1),

(where the second line uses the change of variables and and mutual independence of and ). It follows that

 C(Kij,Kik) =E[Kij⋅Kik]−(E[^fW(w)])2 =[E[fW|A(w|A1)2]−fW(w)2]+O(h2) =V(fW|A(w|A1))+o(1) def≡Ω1(w)+o(1),

with

 Ω1(w)def≡V(fW|A(w|A1)).

Therefore,

 V(^fW(w)) =1n[2(N−2)⋅C(K12,K13)+V(K12)] =4NΩ1(w)+(1nhΩ2(w)−2nΩ1(w))+o(1N) (4) =O(4Ω1(w)N)+O(Ω2(w)nh).

and the mean-squared error of is, using (3) and (4),

 MSE(^fW(w))= (E[^fW(w)]−fW(w))2+V(^fW(w)) = h4B(w)2+4NΩ1(w)+(1nhΩ2(w)−2nΩ1(w)) (5) +o(h4)+o(1N) = O(h4)+O(4Ω1(w)N)+O(Ω2(w)nh)

Provided that and the bandwidth sequence is chosen such that

 Nh→∞,Nh4→0 (6)

as we get that

 MSE(^fW(w)) =o(1N)+O(1N)+o(1N) =O(1N),

and hence that

 √N(^fW(w)−fW(w))=Op(1).

In fact, the rate of convergence of to will be as long as for some as although the mean-squared error will include an additional bias or variance term of if either or does not diverge to infinity.

To derive the MSE-optimal bandwidth sequence we minimize (5) with respect to its first and third terms, this yields an optimal bandwidth sequence of

 h∗N(w) =[14Ω2(w)B(w)21n]15 (7) =O(N−25).

This sequence satisfies condition (6) above.

Interestingly, the rate of convergence of to under condition (6) is the same as the rate of convergence of the sample mean

 ¯Wdef≡1n∑i

to its expectation when Similar variance calculations to those for yield (see also Holland & Leinhardt (1976) and Menzel (2017))

 V(¯W) =O(1N),

provided is non-degenerate, yielding

 √N(¯W−μ)=Op(1).

Thus, in contrast to the case of i.i.d monadic data, there is no convergence-rate “cost” associated with nonparametric estimation of The presence of dyadic dependence, due to its impact on estimation variance, does slow down the feasible rate of convergence substantially. With iid data the relevant rate for density estimation would be when the MSE-optimal bandwidth sequence is used. Recalling that , the rate we find here corresponds to an rate. The slowdown from to captures the rate of convergence costs of dyadic dependence on the variance of our density estimate.

The lack of dependence of the convergence rate of to on the precise bandwidth sequence chosen is analogous to that for semiparametric estimators defined as averages over nonparametrically-estimated components (e.g., Newey, 1994; Powell, 1994). Defining the estimator can be expressed as

 ^fW(w)=1NN∑i=1^fW|A(w|Ai),

where

 ^fW|A(w|Ai)def≡1N−1N∑j≠i,j=1Kij.

Holding fixed, the estimator can be shown to converge to at the nonparametric rate but the average of this nonparametric estimator over converges at the faster (“parametric”) rate In comparison, while

 ¯W=1NN∑i=1^E[Wij∣∣Ai],

for

 ^E[Wij∣∣Ai]def≡1N−1N∑j≠i,j=1Wij,

the latter converges at the parametric rate and the additional averaging to obtain does not improve upon that rate.

## 4 Asymptotic distribution theory

To derive conditions under which is approximately normally distributed it is helpful to decompose the difference between and into four terms:

 ^fW(w)−fW(w) =1n∑i

To understand this decomposition observe that the projection of onto equals, by the independence assumptions imposed on and the U-statistic . This U-Statistic is defined in terms of the latent i.i.d. random variables .

The first term in this expression, line (9), is minus the projection/U-Statistic described above. Each term in this summation has conditional expectation zero given the remaining terms (i.e., the terms form a martingale difference sequence).

The second term in the decomposition, line (10), is the difference between the second-order U-statistic and its Hájek projection (e.g., van der Vaart, 2000)555That is the projection of onto the linear subspace consisting of all functions of the form ., the third term, line (11), is a centered version of that Hájek projection, and the final term, line (12), is the bias of A similar “double projection” argument was used by Graham (2017)

to analyze the large sample properties of the Tetrad Logit estimator.

If the bandwidth sequence satisfies the conditions and the calculations in the previous section can be used to show that the first, second, and fourth terms of this decomposition (i.e., and ) will all converge to zero when normalized by . In this case, , which is an average of i.i.d. random variables, will be the leading term asymptotically such that

 √N(^fW(w)−fW(w))D→N(0,4Ω1(w)),

assuming .

If, however, the bandwidth sequence has (a “knife-edge” undersmoothing condition similar to one considered by Cattaneo et al. (2014) in a different context), then both and will be asymptotically normal when normalized by To accommodate both of these cases in a single result, we will show that a standardized version of the sum will have a standard normal limit distribution, although the first, , term may be degenerate in the limit.

In Appendix A we show that both and will be asymptotically negligible when normalized by the convergence rate of such that the asymptotic distribution of will only depend on the and terms.

We start by rewriting the sum of terms and as

 T1+T3 =1n∑i

where

 T(N)≡N+n

and the triangular array is defined as

 XN1 =2N(E[K12|A1]−E[K12]), XN2 =2N(E[K23|A2]−E[K23]), ⋮ XNN =2N(E[KN,1|AN]−E[KN,1]), XN,N+1 =1n(K12−E[K12|A1,A2]), XN,N+2 =1n(K13−E[K13|A1,A3]) ⋮ XN,N+N−1 =1n(K1N−E[K1N|A1,AN]), ⋮ XN,N+n =1n(KN−1,N−E[KN−1,N|AN−1,AN]).

That is, is the collection of terms of the form

 2N(E[Kij|Ai]−E[Kij])

for (with ) and

 1n(Kij−E[Kij|Ai,Aj])

for and Using the independence assumptions on and , as well as iterated expectations, it is tedious but straightforward to verify that

 E[XNt|{XNs,s≠t}]=0,

that is, is a martingale difference sequence (MDS).

Defining the variance of this MDS as

 σ2N def≡E⎛⎝T(N)∑t=1XNt⎞⎠2 =T(N)∑t=1V(XNt),

we can demonstrate asymptotic normality of its standardized sum –

– by a central limit theorem for martingale difference triangular arrays (see, for example,

Hall & Heyde (1980), Theorem 3.2 and Corollary 3.1 and White (2001), Theorem 5.24 and Corollary 5.26). Specifically, if the Lyapunov condition

 T(N)∑t=1E(XNtσN)r→0 (13)

holds for some and also the stability condition

 T(N)∑t=1(XNtσN)2p→1, (14)

holds then

 T(N)∑t=1XNtσN =1σN(T1+T3) D→N(0,1). (15)

From the calculations used in the MSE analysis of Section 3 we have that

 σ2N =V(T1)+V(T3) =E[K2ij]n+4V(E[Kij|Ai])N+O(1n) =Ω2(w)nh+4Ω1(w)N+O(1n)+O(h2N),

so, taking ,

 1σ2N=O(N)

assuming and In the degenerate case, where , we will still have as long as the “knife-edge” undersmoothing bandwidth sequence is chosen.

To verify the Lyapunov condition (13), note that

 E(1n(Kij−E[Kij|Ai,Aj]))3 ≤8E(Kijn)3 =8n31h3∫[K(w−sh)]3fW(s)ds =8n3h2∫[K(u)]3fW(w−hu)du =O(1n3h2) (16)

and

 E(2N(E[Kij|Ai]−E[Kij]))3 ≤82N3E(E[Kij|Ai])3 =82N3E(∫K(u)fW|A(w−hu|Ai)du)3 =O(1N3). (17)

Putting things together we get that

 T(N)∑t=1E(XNt)3= nE(1n(Kij−E[Kij|Ai,Aj]))3 +NE(2N(E[Kij|Ai]−E[Kij]))3 = O(1(nh)2)+O(1N2) = O(1N2)

when for all Therefore the Lyapunov condition (13) is satisfied for since

 T(N)∑t=1E(XNtσN)3 =O(N3/2)⋅O(1N2) =O(1√N) =o(1).

To verify the stability condition (14), we first rewrite that condition as

 0 =limN→∞⎛⎝1Nσ2NT(N)∑t=1(R1+R2)⎞⎠

where

 R1≡ NN∑i=1[(2N(E[Kij|Ai]−E[Kij]))2−E(2N(E[Kij|Ai]−E[Kij]))2] +N∑i

and

 R2≡N∑i

Since the stability condition (LABEL:eq:_mixing_2) will hold if and both converge to zero in probability.

By the independence restrictions on and the (mean zero) summands in are mutually uncorrelated, so

 E[R21] ≡N2N∑i=1E⎡⎣((2N(E[Kij|Ai]−E[Kij]))2−E(2N(E[Kij|Ai]−E[Kij]))2)2⎤⎦ =O⎛⎝E(E[Kij|Ai])4N⎞⎠+O⎛⎝N2E(Kij)4n3⎞⎠.

But, using analogous arguments to (16) and ((17),

 E[E[Kij|Ai]4]=O(1)

and

 E[K4ij]=O(1h3),

so

 E[R21] =O(1N)+O(N2(nh)3) =O(1N)