1 Introduction
Data for unsupervised learning is increasingly available in the form of graphs or networks. For example, we may analyze gene networks, social networks, or general cooccurrence graphs (e.g., built from purchasing patterns). While classical unsupervised tasks such as density estimation or clustering are naturally formulated for data in vector spaces, these tasks have analogous problems over graphs such as centrality and community detection. We provide a step towards unifying unsupervised learning by recovering the underlying density and metric directly from graphs.
We consider “unweighted directed geometric graphs” that are assumed to have been built from underlying (unobserved) points , . In particular, we assume that graphs are formed by drawing an arc from each vertex to its neighbors within distance . Note that the graphs are typically not symmetric since the distance (the ball) may vary from point to point. By allowing to be stochastic, e.g., depend on the set of points, the construction subsumes also typical nearest neighbor graphs. Arguably, graphs from top friends/products, or coassociation graphs may also be approximated in this manner.
The key property of our family of geometric graphs is that their structure is completely characterized by two functions over the latent space: the local density and the local scale . Indeed, global properties such as the distances between points can be recovered by integrating these quantities. We show that asymptotic behavior of random walks on the directed graphs relate to the density and metric. In particular, we show that random walks on such graphs with minimal degree at least can be completely characterized in terms of and using driftdiffusion processes. This enables us to recover both the density and distance given only the observed graph and the (hypothesized) underlying dimension .
The fact that we may recover the density (up to isometry) is surprising. For example, in nearest neighbor graphs, each vertex has degree exactly . There is no immediate local information about the density, i.e., whether the corresponding point lies in a highdensity region with small ball radii, or in a lowdensity region with large ball radii. The key insight of this paper is that random walks over such graphs naturally drift toward higher density regions, allowing for density recovery.
While the paper is primarily focused on the theoretical aspects of recovering the metric and density, we believe our results offer useful strategies for analyzing realworld networks. For example, we analyzed the Amazon copurchasing graph where an edge is drawn from an item to if is among the top copurchased items with . These Amazon products may be copurchased if they are similar enough to be complementary, but not so similar that they are redundant. We extend our model to deal with connectivity rules shaped like an annulus, and demonstrate that our estimator can simultaneously recover product similarities, product categories, and central products by metric embedding.
1.1 Relation to prior work
The density estimation problem addressed by this paper was proposed and partially solved by von LuxburgAlamgir in [14] using integration of local density gradients over shortest paths. This estimator has since been used for drawing graphs with ordinal constraints in [14] and graph downsampling in [1]. However, the recovery algorithm is restricted to dimensional nearest neighbor graphs under the constraint . Our paper provides an estimator that works in all dimensions, applies to a more general class of graphs, and strongly outperforms that of von LuxburgAlamgir in practice.
On a technical level, our work has similarities to the analysis of convergence of graph Laplacians and random walks on manifolds in [16, 6]. For example, in [13], TingHuangJordan used infinitesimal generators to capture the convergence of a discrete Laplacian to its continuous equivalent on nearest neighbor graphs. However, their analysis was restricted to the Laplacian and did not consider the latent recovery problem. In addition, our approach proves convergence of the entire random walk trajectory and allows us to analyze the stationary distribution function directly.
2 Main results and proof outline
2.1 Problem setup
Let
be an infinite sequence of latent coordinate points drawn independently from a distribution with probability density
in . Let be a radius function which may depend on the draw of . In this paper, we fix a single draw of and analyze the quenched setting. Let be the unweighted directed neighborhood graph with vertex set and with a directed edge from to if and only if .Fix now a large . We consider the random directed graph model given by observing the single graph . The model is completely specified by the latent function and the possibly stochastic . Under the conditions () to be specified below, we solve the following problem:

Given only and , form a consistent estimate of and up to proportionality constants.
The conditions we impose on , , and the stationary density function of the simple random walk on are the following, which we refer to as (). We assume () holds throughout the paper.

The density is differentiable with bounded on a pathconnected compact domain with smooth boundary .

There is a deterministic continuous function on and scaling constants satisfying
so that, a.s. in the draw of , converges uniformly to .

The rescaled density functions are a.s. uniformly equicontinuous.
Remark.
We conjecture that the last condition in () holds for any and satisfying the other conditions in () (see Sconj:holder).
Let denote the set of outneighbors of so that is in if there is a directed edge from to . The second condition in () implies for all that
(1) 
2.2 Statement of results
Our approach is based on the simple random walk on the graph . Let denote the stationary density of . We first show that when appropriately renormalized, converges to an explicit function of and .
Theorem 2.1.
Given (), a.s. in , we have
(2) 
for the normalization constant .
Combining this result with an estimate on the outdegree of points in gives our general result on recovery of density and scale. Let be the volume of the unit ball.
Corollary 2.2.
Assuming (), we have a.s. in that
Proof.
Immediate from the outdegree estimate and Theorem 2.1. ∎
Remark.
If is constant, every edge is bidirectional, so is proportional to the degree of , and we recover the standard ball density estimator.
Our estimator for density closely resembles the PageRank algorithm without damping [10]. In particular, for the nearest neighbor graph, it gives the same rank ordering as PageRank, and it reduces to PageRank as .
When specializing to the nearest neighbor density estimation problem posed by von LuxburgAlamgir in [14], we obtain the following.
Corollary 2.3.
If is selected via the nearest neighbors procedure with and satisfies the first and last conditions in (), we have a.s in that
Proof.
By [4], the empirical induced by the nearest neighbors procedure satisfies the second condition of () with
2.3 Outline of approach
Our proof proceeds via the following steps.

As , the simple random walks on converge weakly to an Itô process , yielding weak convergence between stationary measures. (Theorem 3.4)

The stationary density is explicitly determined via FokkerPlanck equation. (Lemma 4.1)

Uniform equicontinuity of yields convergence in density after rescaling. (Theorem 2.1)
An intuitive explanation for our results is as follows. For large , the simple random walk on , when considered with its original metric embedding, closely approximates the behavior of a driftdiffusion process. Both the process and the approximating walk move preferentially toward regions where is large and diffuse more slowly out of regions where is small. Occupation times therefore give us information about and which allow us to recover them.
Formally, the convergence of to follows by verifying the conditions of the StroockVaradhan criterion (Theorem 3.1) for convergence of discrete time Markov processes to Itô processes [12]
. This criterion states that if the variance
, expected value, a higher order moments
of a jump are continuous and wellcontrolled in the limit, then the process converges to an Itô process under mild technical conditions. By using the FokkerPlanck equation, we can express the stationary density of this Itô process solely in terms of and the outdegree . This allows us to estimate the density using only the unweighted graph.Let and be the closure and boundary of the support of . Let be the ball of radius centered at . Let be the time rescaling necessary for to have timescale equal to that of .
3 Convergence of the simple random walk to an Itô process
We will verify the regularity conditions of the StroockVaradhan criterion (see [12, Section 6]).
Theorem 3.1 (StroockVaradhan).
Let be discretetime Markov processes defined over a domain with boundary . Define the discrete time drift and diffusion coefficients by
If we have , , , and regularity conditions to ensure reflection at (Sthm:tightness and Sthm:stroock), the timerescaled stochastic processes converge weakly in Skorokhod space to an Itô process with reflecting boundary condition
with a standard dimensional Brownian motion and .
Remark.
The original result of StroockVaradhan was stated for for all finite ; our version for is equivalent by [15, Theorem 2.8].
The technical conditions of Theorem 3.1 enforcing reflecting boundary conditions are checked in Sthm:C to Sthm:B. We focus on convergence of the drift and diffusion coefficients.
Lemma 3.2 (Strong LLN for local moments).
For a function such that , given () we have uniformly on that
Proof.
Denote the claimed value of the limit by . For convergence in expectation, we condition on and apply iterated expectation to get
For , we have , so Hoeffding’s inequality yields
(3)  
for by (1). BorelCantelli then yields a.s. convergence. ∎
Remark.
This limit holds even for stochastic as long as a.s. converges uniformly to a deterministic continuous . All statements up to eq:lln hold regardless of stochasticity of and the overall bound only requires convergence of . An example of such a graph is the nearest neighbors graph.
We now compute the drift and diffusion coefficients in terms of and .
Theorem 3.3 (Drift diffusion coefficients).
Almost surely on the draw of , as , we have
where is the Kronecker delta function.
Proof.
By Lemma 3.2, , , and converge a.s. to their expectations, so it suffices to verify that the integrals in Lemma 3.2 have the claimed limits. Because is differentiable on , for any we have the Taylor expansion
of at , where the convergence is uniform on compact sets. For large so that lies completely inside , substituting this expansion into the definitions of , , and and integrating over spheres yields the result. Full details are in Sthm:coefs. ∎
Theorem 3.4.
Under (), as a.s. in the draw of the process converges in to the isotropic valued Itô process with reflecting boundary condition defined by
(4) 
4 Convergence and computation of the stationary distribution
4.1 Graphs satisfying condition ()
The Itô process is an isotropic driftdiffusion process, so the FokkerPlanck equation [11] implies its density at time satisfies
(5) 
where and are given by
Lemma 4.1.
Proof.
By (5), to check that , it suffices to show
We now prove Theorem 2.1 by showing that a rescaling of converges to .
Proof of Theorem 2.1.
The a.s. convergence of processes of Theorem 3.4 implies by EthierKurtz [5, Theorem 4.9.12] that the empirical stationary measures
converge weakly to the stationary measure for . For any and , weak convergence against yields
By uniform equicontinuity of , for any there is small enough so that for all we have
which implies that
Combining with Lemma 4.1 yields the desired
4.2 Extension to isotropic graphs
To obtain our stationary distribution in Theorem 2.1 we require only convergence to some Itô process via the StroockVaradhan criterion. We can achieve this under substantially more general conditions. We define a class of neighborhood graphs on termed isotropic over which we have consistent metric recovery without knowledge of the graph construction method.
Definition 1 (Isotropic).
A graph edge connection procedure on is isotropic if it satisfies:
 Distance kernel:

The probability of placing a directed edge from to is defined by a kernel function mapping locally scaled distances
with obeying () to probabilities
 Nonzero mass:

The kernel function has nonzero integral .
 Bounded tails:

For all , .
 Continuity:

The scaling of the stationary distribution is uniformly equicontinuous.
This class of graph preserves the property that the random graph is entirely determined by the underlying density and local scale ; this allows us to have the same tractable form for the stationary distribution.
Both constant and nearest neighbor graphs are isotropic upon assumption of uniform equicontinuity. Another interesting class of graphs allowed by this generalization is truncated Gaussian kernels, where connectivity probability decreases exponentially. Note that might not be monotonic or continuous in ; one surprising example is , which deterministically connects points in an annulus.
Corollary 4.2 (Generalization).
Proof.
We check the StroockVaradhan condition stated in Theorem 3.1. For this, we use a version of Lemma 3.2 for isotropic graphs, which requires that the ball radius vanishes and that the neighborhood size scales as .
Vanishing neighborhood radius follows because bounded tails and the fact that the kernel is evaluated on ensure the isotropic graph is a subgraph of the ball graph. Kolmogorov’s strong law implies that the stochastic outdegree concentrates around its expectation. It has the correct scaling because the argument of is scaled by . See Sthm:generaldegree for details. Thus the analogue of Lemma 3.2 holds.
We then check that the limiting local moments for isotropic graphs are proportional to those of ball graphs in Slem:polyint. All but one of the conditions for the StroockVaradhan criterion follow from this; the last Sthm:f4 follows from the bounded ball structure of the connectivity kernel.
To check that we obtain the same limiting process and stationary measure, note the ratios of integrals in Theorem 3.3 are unchanged in the isotropic setting. See Slem:polyint for details. Recovering the stationary distribution, density, and local scale is then done in the same manner as in the ball setting. ∎
5 Distance recovery via paths
Our results in Theorem 2.1 give a consistent estimator for the density and the local scale . These two quantities specify up to isometry the latent metric embedding of .
In order to reconstruct distances between nonneighbor points we weight the edges of by weights and find the shortest paths over this graph, which we call . The results of Alamgirvon Luxburg [2, Section 4.1] show that in the nearest neighbor graph case, setting for the estimator of results in consistent recovery of pairwise distances.
In Sthm:dist, we give a straightforward extension of this approach to show that given any uniformly convergent estimator of , the shortest path on the weighted graph converges to the geodesic distance. Applying standard metric multidimensional scaling then allows us to embed these distances and recover the latent space up to isometry.
6 Empirical results
We demonstrate extremely good finite sample performance of our estimator in simulated density reconstruction problems and two realworld datasets. Some details such as exact graph degrees and distribution parameters are in the supplementary code which reproduces all figures in this paper. Standard graph statistics such as centrality and Jaccard index are calculated via the
igraph package [3].nearest neighbor graphs
We compared our randomwalk based estimator and the pathintegral based estimator of von LuxburgAlamgir [14] to the metric nearest neighbor density estimator. The number of samples was varied from to along with the sparsity level (Figure 2).
While our theoretical results suggest that both our algorithm and the pathintegral estimator of von LuxburgAlamgir [14] might fail to converge at and sparsity levels, in practice our estimator performs nearly perfectly at both low sparsity levels.
For constant degree we achieve nearperfect performance for all choices of , while the pathintegral estimator fails to converge in the regime.
Some specific examples of our density estimator with are shown in Figure 2. The examples are mixture of uniforms (left), mixture of Gaussians (center), and distribution (right). As predicted, our estimator tracks extremely closely with the metric nearest neighbor estimator (red and blue), as well as the true density (black). The path integral estimator has high estimate variance at points with large density and fails to cope with the two mixture densities.
Varying the dimension for an isotropic multivariate normal with , we find that a large number of points are required to maintain high accuracy as grows large (red and blue lines in Figure 4). However, this is due to a global ‘flattening’ of the density. Measuring the correlation between the true and estimated log probabilities show that up to a global concentration parameter, the estimator maintains high accuracy across a large number of dimensions (black lines).
Kernel graphs
We validate the nonparametric estimator in Corollary 4.2 for kernel graphs by constructing three drastically different kernel graphs. In all cases, we sampled 5000 points with the connection probability following . We varied the neighborhood structure in three ways: a constant kernel, ; nearest neighbor kernel: ; and spatially varying kernel .
In Figure 4, we find that our nonparametric estimator (black) always matches the ground truth (red). This example also shows that both the degree and the stationary distribution can be valid density estimators under certain assumptions, but only our estimator can deal with arbitrary isotropic graph construction methods without assumptions.
Metric recovery on real data
As an example of metric reconstruction, we take the first 2000 examples in the U.S. postal service (USPS) digits dataset [7] and construct an unweighted nearest neighbor graph. We use our method to reconstruct the metric and perform similarity queries, and the Jaccard index was used to tiebreak direct neighbors.
The USPS digits dataset is known to have a highdensity cluster of ones digits (orange). Results in Figure 7 show that we are able to successfully recover the density structure of the data (top). Interpoint distances estimated by our method (Figure 7, axis) show nearly linear agreement to the true metric (axis) at short distances and high similarity globally.
Performing a similarity query on the data (Figure 7) shows that the our reconstructed distances (bottom row) have a more coherent set of similar digits when compared to the Jaccard index (top row) [8]. The behavior of the unweighted Jaccard similarity is due to a known problem with shortest paths in nearest neighbor graphs preferring low density regions [14].
Amazon copurchasing data
Classics  Literature  Classical music  Philosophy 

The Prince  The Stranger  Beethoven: Symphonien Nos. 5 &  The Practice of Everyday Life 
The Communist Manifesto  The Myth of Sisyphus  Mozart: Symphonies Nos. 3541  The Society of the Spectacle 
The Republic  The Metamorphosis  Mozart: Violin Concertos  The Production of Space 
Wealth of Nations  Heart of Darkness  Tchaikovsky: Concerto No. 1/Rac  Illuminations 
On War  The Fall  Beethoven: Symphonies Nos. 3 &  Space and Place: The Perspectiv 
Finally, we recover density and metric on a real network dataset with no ground truth. We analyzed the largest connected component of the Amazon copurchasing network dataset [9]. Each vertex is a product on amazon.com along with its category and sales rank, and each directed edge represents a copurchasing recommendation of the form “person who bought also bought .” This dataset naturally fulfills our assumptions of having edges that are asymmetric, where edges represent a notion of similarity in some space.
The items that lie in regions of highest density should be archetypal products for a category, and therefore be more popular. We show that the density estimates using our method with show a strong positive association between density and sales (Figure 9). We found that this effect persisted regardless of choice of . Other popular measures of network centrality such as betweenness and closeness fail to display this effect.
We then attempted metric recovery using our random walk based reconstruction (Figure 9). For visualization purposes, we used multidimensional scaling on the recovered metric to embed points belonging to categories with at least two hundred items. The embedding shows that our method captures the separation across different product categories. Notably, nonfiction and history have substantial overlap as expected, while classical music CD’s and computer science books have little overlap with the other clusters.
Analyzing the modes of the density estimate by clustering each point to its local mode, we find coherent clusters where top items serve as archetypes for the cluster (Table 1). This suggests that there may be a close connection between clustering in a metric space and community detection in network data. The overall performance of our method on density estimation and metric recovery for the Amazon dataset suggests that when a metric assumption is appropriate, our random walk based metric quantities can be used directly for centrality and cluster estimates on a network.
7 Conclusions
We have presented a simple explicit identity linking the stationary distribution of a random walk on a neighborhood graph to the density and neighborhood size.
The density estimator constructed by inverting this identity shows an extremely rapid convergence to the metric nearest neighbor density estimator across a range of data point count, sparsity level, and distribution type (Figures 2,2). We also generalized the theorem to a large class of graph construction techniques and demonstrated that the choice of construction technique matters little for accuracy (Figures 4).
Our estimator performed well on realworld data, recovering underlying metric information in test data (Figures 7,7) and predicting popular Amazon products through density estimates (Figure 9).
There are several open questions left unanswered by our work. Our results required that the graphs be of degree rather than the required for connectivity. Our simulation results seem to suggest than even near the regime our estimator performs nearly perfectly, suggesting that the true degree lower bound may be much lower.
The close connection of our density estimate to PageRank suggests that combining the latent spatial map with vector space estimates may lead to highly effective and theoretically principled network algorithms.
References
 [1] M. Alamgir, G. Lugosi, and U. von Luxburg. Densitypreserving quantization with application to graph downsampling. In COLT, 2014.

[2]
M. Alamgir and U. V. Luxburg.
Shortest path distance in random knearest neighbor graphs.
In
Proceedings of the 29th International Conference on Machine Learning (ICML12)
, pages 1031–1038, 2012.  [3] G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems:1695, 2006.
 [4] L. P. Devroye and T. Wagner. The strong uniform consistency of nearest neighbor density estimates. The Annals of Statistics, pages 536–540, 1977.
 [5] S. N. Ethier and T. G. Kurtz. Markov processes: characterization and convergence. John Wiley & Sons, 1986.
 [6] M. Hein, J.y. Audibert, U. V. Luxburg, and S. Dasgupta. Graph Laplacians and their convergence on random neighborhood graphs. Journal of Machine Learning Research, page 2007, 2006.
 [7] J. J. Hull. A database for handwritten text recognition research. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(5):550–554, 1994.
 [8] P. Jaccard. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles, 37:547–579, 1901.
 [9] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1):5, 2007.
 [10] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. 1999.
 [11] H. Risken. FokkerPlanck Equation. Springer, 1984.
 [12] D. Stroock and S. Varadhan. Diffusion processes with boundary conditions. Communications on Pure and Applied Mathematics, 24:147–225, 1971.
 [13] D. Ting, L. Huang, and M. I. Jordan. An analysis of the convergence of graph Laplacians. In Proceedings of the 27th International Conference on Machine Learning (ICML10), pages 1079–1086, 2010.
 [14] U. Von Luxburg and M. Alamgir. Density estimation from unweighted nearest neighbor graphs: a roadmap. In Advances in Neural Information Processing Systems, pages 225–233. Springer, 2013.
 [15] W. Whitt. Some useful functions for functional limit theorems. Math. Oper. Res., 5(1):67–85, 1980.
 [16] W. Woess. Random walks on infinite graphs and groups  a survey on selected topics. Bull. London Math. Soc, 26:1–60, 1994.