Many natural and man-made systems can be represented as graphs, sets of objects (called nodes) and pairwise relations between these objects (called edges). These include the brain, which contains neurons (nodes) that exchange signals through chemical pathways (edges), the Internet, which contains websites (nodes) that are connected via hyperlinks (edges), etc. To study graphs, researchers in diverse domains have used Personalized PageRank (PPR). Informally, PPR assigns to each node a vector , where describes the importance of from the perspective of . PPR has proven useful in many practical and graph theoretic applications. Examples include recommending who a user should follow on Twitter  (user may wish to follow user if is large), and partitioning graphs locally around a seed node  (the set of nodes with large can be viewed as a community surrounding ). Unfortunately, computing all PPR vectors (where is the number of nodes) is infeasible for the massive graphs encountered in practice.
In this work, we argue that all PPR vectors can be accurately estimated by computing only a vanishing fraction of the vector elements, with high probability and for a certain class of random graphs. This arises as a consequence of our main (structural) result, which shows that the dimensionality of the set of PPR vectors scales sublinearly in with high probability, for the same class of random graphs and for a notion of dimensionality somewhat similar to matrix rank. We note that the estimation scheme considered was first proposed by Jeh and Widom in  without a formal analysis, so another contribution of our paper is to address this lacuna.
We begin by defining the main ingredients of the paper. Most notation is standard or defined as needed, but we note the following is often used: for and , , satisfies (where is the indicator function), and .
2.1 Directed configuration model (DCM)
We consider a random graph model called the directed configuration model (DCM). For the DCM, we are given realizations of random sequences and satisfying and (we assume for simplicity).111More specifically, we would like i.i.d. and i.i.d. for given distributions and , but this does not guarantee . For this reason, the authors of  provide an method to generate these sequences such that , and , , where i.i.d., i.i.d., and denotes convergence in distribution. Our goal is to construct a directed graph , such that has in- and out-degree and , respectively. For this, we first assign incoming half-edges and outgoing half-edges to each ; we call these half-edges instubs and outstubs, respectively. We then randomly pair half-edges in a breadth-first search fashion that proceeds as follows:
Choose uniformly. For each of the outstubs assigned to , sample an instub uniformly from the set of all instubs (resampling if the sampled instub has already been paired), and pair the outstub and instub to form a directed edge out of .
Let . For each , pair the outstubs assigned to using the method that ’s outstubs were paired in Step 1.
Continue iteratively until all half-edges have been paired. Namely, during the -th iteration we pair the oustubs of all , where are nodes at distance from (those for which a path of length from to exists, but no shorter path from to exists).
We define this procedure formally in Appendix A.2. For now, the important points to remember are that the initial node is chosen uniformly at random from , and that, at the end of the -th iteration, the -step neighborhood out of has been constructed. We emphasize the resulting graph will be a multi-graph in general, i.e. it will contain self-loops (edges for ) and multi-edges (more than one edge from to ). In , the authors provide conditions under which a simple graph results with positive probability as , but these are stronger than the conditions we require to prove our main result. Hence, we assume is a multi-graph.
2.2 Personalized PageRank (PPR)
To define PPR, we require some notation. First, let denote the adjacency matrix for some realization of the DCM, i.e. is the number of directed edges from to (). Next, let
be the row stochastic matrix with. Finally, let , and let denote the length- vector of ones. We then have the following.
For , the PPR row vector is the stationary distribution of the Markov chain with transition matrix
is the stationary distribution of the Markov chain with transition matrix.
Note that , , , and all depend on . However, to avoid cumbersome notation, we do not explicitly denote this, and the dependence on will be clear from context.
The Markov chain described in Definition 1 has the following dynamics: follow a uniform random walk with probability , and jump to with probability . This motivates an interpretation of PPR as a centrality measure of the nodes from the perspective of . To see this, let denote the Markov chain with transition matrix . Then one can show (see Appendix B.1.1)
where , and where denotes expectation with some realization of the DCM held fixed. Hence, is large when is frequently visited (a notion of centrality) on -length walks beginning at (a notion of ’s perspective).
We note the typical definition of PPR assumes is constant; in contrast, we take . We argue in Section 4.2 that this is appropriate when considering the asymptotic behavior of PPR on the DCM. Specifically, we argue that the size of the set of nodes that are important to grows with the graph, but grows slowly enough that a notion of ’s perspective remains, when . (In contrast, this set has constant size when is constant.) Additionally, the spectral gap of is lower bounded by , so as results in this lower bound vanishing asymptotically. We note a line of work by Boldi et al. [10, 11] analyzed the limit of PPR as for a fixed graph ; in contrast, we fix a value of for each .
Finally, we emphasize the distinction between PPR and the more commonly known notion of PageRank, which we refer to as global PageRank. In short, global PageRank is the average of all PPR vectors, i.e. . Hence, global PageRank is a centrality measure from the perspective of a uniform node. More generally, given a distribution on , the PPR corresponding to is
, where the random variablehas as its distribution.
2.3 PPR dimensionality and algorithmic implications
Our main goal is to investigate the dimensionality of the set of PPR vectors, . A standard measure of the dimension of such a set is the size of its largest linearly independent subset. However, , is a linearly independent set itself 222To see why, first suppose is not invertible. Then for some , so . But, by the Perron-Frobenius theorem, cannot have eigenvalue
cannot have eigenvalue, since it is row stochastic. Hence, is invertible, so by (4), the matrix with rows is invertible as well., so we will instead consider a different notion of dimensionality. This notion is motivated by the following observation: given vectors , the size of a linearly independent subset of can be bounded by , where and . We will relax this slightly, by only including in those that are not “close” to a linear combination of . In particular, given , our notion of dimensionality is , where
Note we can also interpret (2) algorithmically: if is known, can be accurately estimated by computing , when and fails. Hence, (2) is the number of vectors that must be computed to ensure all vectors are accurately estimated (see Section 5). We note is included in (3) because it is a known component of ; indeed, by Definition 1,
For ease of analysis, we will upper bound by choosing solely based on the degree sequence. Specifically, let , define , and let . For , we then define
where the subscript indicates that the right side depends on through . Our main result, Theorem 1, shows that scales sublinearly in with high probability, under certain assumptions on the degree sequence and for a particular choice of . In other words, though is a linearly independent set (for every finite ), our notion of dimensionality suggests the effective dimension is (asymptotically) much smaller.
We note that, in addition to bounding by , we will later bound by choosing a specific , which is not necessarily the solution of the optimization problem in (3). Hence, the exact solution of (2) remains an open question. Furthermore, in light of the preceding algorithmic interpretation of (2), another open problem is to solve (2) while ensuring can be efficiently computed when and fails.
Finally, recall is a random sequence; hence, with fixed, is a random sequence as well. Towards proving our main result, intermediate results will be established with held fixed, after which conditional expectation with respect to will be taken. This motivates the following definitions: .
3 Related work
In , Jeh and Widom propose a scheme for estimating all PPR vectors, . The scheme relies crucially on the Hubs Theorem in , which states that the PPR vector , can be written as a linear combination of and another vector. The Hubs Theorem is central to our results as well; an alternative formulation appears as Lemma 2 here. We discuss the algorithm of Jeh and Widom in more detail in Section 5.
Unfortunately, the authors of  present no analysis of their scheme. Hence, it is unclear how should be chosen and how large it must be to guarantee accurate estimation. Our work addresses this shortcoming. Specifically, as discussed briefly in the introduction and in more detail in Section 5, our dimensionality measure (5) relates to the complexity of this scheme.
In , Chen, Litvak, and Olvera-Cravioto consider the limiting value of as
weakly converges to probability distribution. Specifically, they show that the PPR value of a uniformly chosen node is given by the solution of a recursive distributional equation (RDE) . They also show (roughly) that PPR values follow a power law when in-degrees follow a power law, establishing the so-called “power law hypothesis.” Similar results were later established for a family of inhomogeneous directed graphs in . On the other hand,  was preceded by , where the power law hypothesis was established for global PageRank; further back, the hypothesis was studied under more restrictive assumptions in [29, 36, 37].
While [15, 16, 27, 29, 36, 37] share a goal of understanding the power law behavior of PPR on random graphs, our goal is to instead understand structural properties of the PPR vectors collectively, with the focus of this paper being dimensionality. Since dimensionality carries with it algorithmic implications, our work is perhaps more useful from a practical perspective when compared to this body of work. However, the analytical approaches of these works will be extremely useful to us. Specifically, the proof of our main result follows an approach similar to , and we use a modified version of Lemma 5.4 from , which appears as Lemma 5 here.
In short, our work can be seen as an attempt to combine the strengths of , which is entirely algorithmic, and , which is entirely analytical. Specifically, we leverage the analytical approach from  to obtain guarantees on the algorithm from .
More broadly, references for PageRank and PPR include , in which PageRank and PPR were first proposed, and , an early study of PPR (there called “topic-sensitive” PageRank). Beyond , many other works have proposed efficient computation and estimation algorithms for PPR; a small sample includes those using linear algebraic techniques [33, 34], those using dynamic programming [2, 3], and those using randomized schemes [6, 30]. In addition to the body of work on the power law hypothesis, analysis of PPR on random graphs includes . Here it is shown that, for undirected random graphs with a certain expansion property, can be well approximated (in the total variation norm) as a convex combination of and the degree distribution.
The DCM was proposed and analyzed in  as an extension of the (undirected) configuration model, the development of which began in [8, 13, 38]. The configuration model (and variants) have been studied in detail; for example,  considers graph diameter in this model, while  studies the emergence of a giant component.
4 Dimensionality analysis
In this section, we present our dimensionality analysis. We begin by defining our assumptions and proposing a specific choice of . We then state the result and comment on our assumptions.
4.1 Assumptions on degree sequence
To prove our main result, we require Assumption 1
, which states that certain empirical moments of the sequenceexist with high probability, and furthermore, converge to limits at a uniform rate. Since we follow the analytical approach of , this assumption is similar to the main assumption in that work. We offer more specific comments shortly.
We have for some , where and for some constants and ,
Furthermore, we have , and we define .
The constants and will appear in our main result, and both have simple interpretations: letting satisfy , it is straightforward to show and , i.e. and give the limiting expected out-degree and the limiting probability of belonging to , respectively. (The other constants in Algorithm 1 will not appear in our main result, but they have similar interpretations.) We also remark that is not necessary to establish our results but, given this interpretation, is the more interesting case.
4.2 Choice of
Let be a constant, and let uniformly. For , let denote the -step neighborhood out of , i.e. . If for some , let . Then
If instead is a constant, let . Then
See Appendix C.1.
Loosely speaking, Claim 1 states that, for both choices of , all but of ’s PPR concentrates on a small neighborhood surrounding , for any . The difference is the size of this neighborhood: when , the neighborhood grows with the graph; when is constant, the neighborhood has constant size. From the PPR interpretation of Section 2.2, this suggests that the number of nodes that are important to grows in the former case but remains fixed in the latter case. We believe the former case is more appropriate. Additionally, the growth of the important set of nodes remains sublinear in in the former case; intuitively, this says that a vanishing fraction of all nodes are important to , i.e. a notion of ’s perspective remains. Finally, Claim 1 suggests that is necessarily linear when is constant: since PPR vectors are supported on constant size sets in this case, we expect must be linear to cover a linear number of these sets.
4.3 Main result
We now turn to our main result, which relies on the following key lemma.
Show that, for a certain choice of , the error term in can be bounded by only examining the -step neighborhood out of .
Argue that, conditioned on certain events not occurring during the first steps of the graph construction, this bound follows the same distribution as a quantity defined on a tree.
Bound the probability of these events occurring during the first iterations.
Bound conditioned on the events not occurring by analyzing the tree quantity.
Before proceeding, we pause to state the choice of from Step 1, which will be used in Section 5. First, for any realization of the DCM and for , we define
where is defined in Section 2.2. Note is the transition matrix of a Markov chain similar to that in Definition 1; however, upon reaching , the random walker jumps back to with probability 1. Letting denote the stationary distribution of this chain, one can show (see Appendix A.1)
and we take as in (12) in Step 1. We also note this provides another interpretation of Lemma 1. Informally, since , (12) implies for large , so is nearly a convex combination. Hence, when fails, is close to the convex hull of , a small subset of the -dimensional simplex to which belongs.
We now turn to the main result. First, note Lemma 1 will allow us to show the second summand in (5) is bounded (in expectation) by , which is sublinear. Hence, to ensure (5) is sublinear, it only remains to choose such that is sublinear as well. On the other hand, in Assumption 1 requires to contain a constant fraction of all instubs, suggesting we should choose to be nodes with high in-degree. Together, these observations motivate our choice of : for we define as the function that chooses the nodes of highest in-degree as . Formally, is the function that maps to with , where is such that .
With this in place, we present Theorem 1. Together with Assumption 1, it states the following: when certain moments of the degree sequence exist, and when a sublinear number of nodes contains a constant fraction of instubs, the dimension of the set of PPR vectors scales sublinearly.
See Appendix C.2.
4.4 Comments on assumptions
We begin with comments on in Assumption 1. First, note that, given and , implicitly requires to converge to a specific limit: indeed, assuming it converges,
With sublinear in Theorem 1, , so we require .
We next argue is not restrictive (at least in its own right). In fact, it is essentially implied by sublinearity of in Theorem 1 and , since then the fraction in satisfies
Next, we note are similar to assumptions found in  and are fairly standard given our approach, which leverages the fact that the random graph is asymptotically locally treelike . In fact, is a weaker assumption than that required in , which is why (as mentioned in Section 3) we use a modified version of one of their lemmas. See Appendix A.3 for details.
Finally, requires to converge to with sublinear in Theorem 1. We offer empirical evidence that this occurs for certain graphs of interest. Specifically, in Figure 0(a), remains constant and strictly less than 1 as grows, for a variety of sublinear choices. For this plot, in-degrees were sampled from a power law distribution with exponent , i.e. . This in-degree distribution is commonly seen in real graphs and has been studied extensively, e.g. [7, 18]. As an example, Figure 0(b) compares the histogram of these in-degrees with the in-degrees of the Twitter graph (available at  from WebGraph ). The histograms are similar for most values of ; both are roughly linear with slopes over . In short, a common model of in-degree distributions empirically satisfies with sublinear.
5 Algorithms and experiments
5.1 Algorithm to estimate
The basic idea behind this scheme is that, from (11), may be close to ; however, no formal analysis is provided. Here we show that our dimensionality result provides such an analysis.
In other words, (19) shows we can use to compute the estimation error indirectly, i.e. without actually computing . This suggests a new scheme, which proceeds as follows. First, compute (as in the existing scheme). Next, for , compute . If (19) holds, estimate as ; else, compute .
Using this scheme, we either compute exactly, or we obtain an estimate within of (in the norm), . The remaining question is the scheme’s complexity, which we take to be the number of PPR values that are computed. First, for , such values () are computed. Next, for , such values () are computed. Finally, an additional such values () are computed for s.t. (19) fails; by definition, this occurs for such when is chosen by . Hence, the number of PPR values computed is
which is sub-quadratic with high probability when Theorem 1 applies. (We have assumed the computation of is no more costly than the computation of PPR values on the original graph; this is because are computed on a sparser graph.) Hence, all PPR vectors can be accurately estimated by computing a vanishing fraction of the vector elements.
Finally, we remark that this scheme can also be viewed as approximating , the matrix with -th row . To see this, let be the estimate of from the scheme, i.e. if and (19) holds, otherwise. Then, by (19), , so (where is the norm of the matrix ). Hence, the scheme approximates with bounded error in the norm.
5.2 Empirical results
We now demonstrate the performance of this algorithm using two datasets from the Stanford Network Analysis Platform (SNAP) : soc-Pokec, a social network, and web-Google, a partial web graph (see Appendix D.1 for details). For both graphs, we choose the top nodes by in-degree as (i.e. ), set , and, , compute a bound on the error using a power iteration scheme described in Appendix D.2. Figure 1(a) shows histograms of the error bound, while Figure 1(b) shows our dimensionality measure. Note (as proven in Appendix D.2), error is zero when , where
(In words, the error is zero when no outgoing neighbors of belong to .) As a result, the spikes at in Figure 1(a) have height , and in Figure 1(b). Additionally, we show in Appendix D.2 that error is bounded by ; hence, the spikes at right in Figure 1(a), and the “dips” at right in Figure 1(b), occur at ). Between these spikes, the soc-Pokec histogram quickly decay beyond ; this corresponds to the dimensionality being nearly flat beyond in Figure 1(b). (For web-Google, similar behavior occurs, though it is less pronounced). Finally, we highlight two points on Figure 1(b), for soc-Pokec and for web-Google. The soc-Pokec point, for example, shows that computing of PPR vectors guarantees the estimation error for other PPR vectors is below (i.e. the worst-case error is reduced by a factor of 3). See Appendix D.3 for further empirical results for these datasets.
Figure 1(b) also highlights another aspect of . Specifically, the discussion at the end of Section 5.1 and the steep decay in Figure 1(b) suggests that most of the “energy” of is contained in a small number of dimensions, in the norm. Hence,
is roughly analogous to stable rank, a more common dimensionality measure that instead measures energy using singular values (namely, stable rank is, where are the ordered singular values).
In Appendix D.2, we also describe how the power iteration scheme allows us to compute a bound on the average error indirectly (i.e., without actually computing the error for each ). Hence, we show the average error bound for a wider variety of SNAP datasets in Figure 2(a)
. Interestingly, the two social networks soc-LiveJournal1 and soc-Pokec have similar behavior, as do the two web graphs web-BerkStan and web-Stanford (web-Google is somewhat of an outlier; we believe its average error is lowest in part because itsis largest). Finally, in Figure 2(b), we show the average error bound computed on a DCM with power law in-degrees. As suggested by Lemma 1, average error shrinks as grows (despite shrinking as well); this is in part because, from Figure 0(a), the fraction of instubs belonging to is constant.
In this work, we argued (analytically for the DCM and empirically for other graphs) that the dimensionality of scales sublinearly in . We also used our analysis to bound the complexity of the algorithm from . Our analysis suggests several avenues for future work. First, the proof of Lemma 1 can be modified to analyze the tail of the error (this would essentially involve replacing Lemma 6 with a tail bound on a maximum instead of a sum). Hence, bounding absolute error for the estimate of for any is a straightforward extension; a more useful but less immediate analysis would involve bounding relative error. Second, examining PPR dimensionality for other random graph models may be of interest. For example, several papers have analyzed PPR on preferential attachment models [5, 21]; we suspect a dimensionality analysis for such graphs would yield a message similar to our work ( should contain nodes with highest in-degree). A more interesting class of graphs would be the stochastic block model; here it may be more beneficial to choose such that each community contains a nonempty subset of .
Note on the organization of appendices: Appendix A outlines the key ideas and intuition behind the proof of Lemma 1, which contains the bulk of our technical analysis and itself requires five lemmas. The proofs of these lemmas are found in subsections of Appendix B, in the order their statements appear in Appendix A. Shorter proofs (those of Claim 1 and Theorem 1) are found in Appendix C. Finally, Appendix D contains details on the experiments of Section 5.
Appendix A Lemma 1 proof outline
In this appendix, we outline the proof of Lemma 1. Our approach follows the outline described in Section 4.3. Specifically, we consider Steps 1-4 of the outline in Appendices A.1-A.4, respectively. In Appendix A.5, we combine the results to prove the lemma.
a.1 Error bound in -step neighborhood (Step 1)
Our first goal is to bound the error term