# The impossibility of low rank representations for triangle-rich complex networks

The study of complex networks is a significant development in modern science, and has enriched the social sciences, biology, physics, and computer science. Models and algorithms for such networks are pervasive in our society, and impact human behavior via social networks, search engines, and recommender systems to name a few. A widely used algorithmic technique for modeling such complex networks is to construct a low-dimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to the common view, we argue that such graph embeddings do notcapture salient properties of complex networks. The two properties we focus on are low degree and large clustering coefficients, which have been widely established to be empirically true for real-world networks. We mathematically prove that any embedding (that uses dot products to measure similarity) that can successfully create these two properties must have rank nearly linear in the number of vertices. Among other implications, this establishes that popular embedding techniques such as Singular Value Decomposition and node2vec fail to capture significant structural aspects of real-world complex networks. Furthermore, we empirically study a number of different embedding techniques based on dot product, and show that they all fail to capture the triangle structure.

## Authors

• 22 publications
• 2 publications
• 4 publications
• 17 publications
• ### Node Embeddings and Exact Low-Rank Representations of Complex Networks

Low-dimensional embeddings, from classical spectral embeddings to modern...
06/10/2020 ∙ by Sudhanshu Chanpuriya, et al. ∙ 0

• ### Heterogeneous Information Network Embedding for Meta Path based Proximity

A network embedding is a representation of a large graph in a low-dimens...
01/19/2017 ∙ by Zhipeng Huang, et al. ∙ 0

• ### DeepWalking Backwards: From Embeddings Back to Graphs

Low-dimensional node embeddings play a key role in analyzing graph datas...
02/17/2021 ∙ by Sudhanshu Chanpuriya, et al. ∙ 0

• ### Spectral Embedding of Graph Networks

We introduce an unsupervised graph embedding that trades off local node ...
09/30/2020 ∙ by Shay Deutsch, et al. ∙ 0

• ### The iterated local transitivity model for hypergraphs

Complex networks are pervasive in the real world, capturing dyadic inter...
01/29/2021 ∙ by Natalie C. Behague, et al. ∙ 0

• ### Representation Learning for Scale-free Networks

Network embedding aims to learn the low-dimensional representations of v...
11/29/2017 ∙ by Rui Feng, et al. ∙ 0

• ### Neural Embeddings of Scholarly Periodicals Reveal Complex Disciplinary Organizations

Understanding the structure of knowledge domains has been one of the fou...
01/22/2020 ∙ by Hao Peng, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Complex networks (or graphs) are a fundamental object of study in modern science, across domains as diverse as the social sciences, biology, physics, computer science, and engineering [WF94, New03, EK10]. Designing good models for these networks is a crucial area of research, and also affects society at large, given the role of online social networks in modern human interaction [BA99, WS98, CF06]. Complex networks are massive, high-dimensional, discrete objects, and are challenging to work with in a modeling context. A common method of dealing with this challenge is to construct a low-dimensional Euclidean embedding that tries to capture the structure of the network (see [HYL17] for a recent survey). Formally, we think of the

vertices as vectors

, where is typically constant (or very slowly growing in ). The likelihood of an edge is proportional to (usually a non-negative monotone function in)  [ASN13, CLX16]. This gives a graph distribution that the observed network is assumed to be generated from.

The most important method to get such embeddings is the Singular Value Decomposition (SVD) or other matrix factorizations of the adjacency matrix [ASN13]

. Recently, there has also been an explosion of interest in using methods from deep neural networks to learn such graph embeddings

[PARS14, TQW15, CLX16, GL16] (refer to [HYL17] for more references). Regardless of the specific method, a key goal in building an embedding is to keep the dimension small — while trying to preserve the network structure — as the embeddings are used in a variety of downstream modeling tasks such as graph clustering, nearest neighbor search, and link prediction [Twi18]. Yet a fundamental question remains unanswered: to what extent do such low dimensional embeddings actually capture the structure of a complex network?

These models are often justified by treating the (few) dimensions as “interests” of individuals, and using similarity of interests (dot product) to form edges. Contrary to the dominant view, we argue that low-dimensional embeddings are not good representations of complex networks. We demonstrate mathematically and empirically that they lose local structure, one of the hallmarks of complex networks. This runs counter to the ubiquitous use of SVD in data analysis. The weaknesses of SVD have been empirically observed in recommendation tasks [BCG10, GGL13, KUK17], and our result provides a mathematical validation of these findings.

Let us define the setting formally. Consider a set of vectors (denoted by the matrix ) used to represent the vertices in a network. Let denote the following distribution of graphs over the vertex set . For each index pair , independently insert (undirected) edge

with probability

. (If is negative, is never inserted. If , is always inserted.) We will refer to this model as the “embedding” of a graph , and focus on this formulation in our theoretical results. This is a standard model in the literature, and subsumes the classic Stochastic Block Model [HLL83] and Random Dot Product Model [YS07, AFL18]. There are alternate models that use different functions of the dot product for the edge probability, which are discussed in Section 1.2. Matrix factorization is a popular method to obtain such a vector representation: the original adjacency matrix is “factorized” as , where the columns of are .

Two hallmarks of real-world graphs are: (i) Sparsity: The average degree is typically constant with respect to , and (ii) Triangle density: there are many triangles incident to low degree vertices [WS98, SCW10, SKP12, DPKS12]. The large number of triangles is considered a local manifestation of community structure. Triangle counts have a rich history in the analysis and algorithmics of complex networks. Concretely, we measure these properties simultaneously as follows.

###### Definition 1.1.

For parameters and , a graph with vertices has a -triangle foundation if there are at least triangles contained among vertices of degree at most . Formally, let be the set of vertices of degree at most . Then, the number of triangles in the graph induced by is at least .

Typically, we think of both and as constants. We emphasize that is the total number of vertices in , not the number of vertices in (as defined above). Refer to real-world graphs in Table 4. In Figure 1, we plot the value of vs . (Specifically, the axis is the number of triangles divided by .) This is obtained by simply counting the number of triangles contained in the set of vertices of degree at most . Observe that for all graphs, for , we get a value of (in many cases ).

Our main result is that any embedding of graphs that generates graphs with -triangle foundations, with constant , must have near linear rank. This contradicts the belief that low-dimensional embeddings capture the structure of real-world complex networks.

###### Theorem 1.2.

Fix . Suppose the expected number of triangles in that only involve vertices of expected degree is at least . Then, the rank of is at least .

Equivalently, graphs generated from low-dimensional embeddings cannot contain many triangles only on low-degree vertices. We point out an important implication of this theorem for Stochastic Block Models. In this model, each vertex is modeled as a vector in , where the th entry indicates the likelihood of being in the th community. The probability of an edge is exactly the dot product. In community detection applications, is thought of as a constant, or at least much smaller than . On the contrary, Theorem 1.2 implies that must be to accurately model the low-degree triangle behavior.

### 1.1 Empirical validation

We empirically validate the theory on a collection of complex networks detailed in Table 4. For each real-world graph, we compute a 100-dimensional embedding through SVD (basically, the top 100 singular vectors of the adjacency matrix). We generate samples of graphs from these embeddings, and compute their vs plot. This is plotted with the true vs plot. (To account for statistical variation, we plot the maximum value of observed in the samples, over all graphs. The variation observed was negligible.) Figure 1 shows such a plot for a physics coauthorship network. More results are given in Section 4.

Note that this plot is significantly off the mark at low degrees for the embedding. Around the lowest degree, the value of (for the graphs generated by the embedding) is 2-3 order of magnitude smaller than the original value. This demonstrates that the local triangle structure is destroyed around low degree vertices. Interestingly, the total number of triangles is preserved well, as shown towards the right side of each plot. Thus, a nuanced view of the triangle distribution, as given in Definition 1.1, is required to see the shortcomings of low dimensional embeddings.

### 1.2 Alternate models

We note that several other functions of dot product have been proposed in the literature, such as the softmax function [PARS14, GL16] and linear models of the dot product [HYL17]. Theorem 1.2 does not have direct implications for such models, but our empirical validation holds for them as well. The embedding in Theorem 1.2 uses the truncated dot product (TDP) function

to model edge probabilities. We construct other embeddings that compute edge probabilities using machine learning models with the dot product and Hadamard product as features. This subsumes linear models as given in

[HYL17]. Indeed, the TDP can be smoothly approximated as a logistic function. We also consider (scaled) softmax functions, as in [PARS14], and standard machine learning models (LRDP, LRHP). (Details about these models are given in Section 44.2.)

For each of these models (softmax, LRDP, LRHP), we perform the same experiment described above. Figure 1 also shows the plots for these other models. Observe that none of them capture the low-degree triangle structure, and their values are all 2-3 orders of magnitude lower than the original.

In addition (to the extent possible), we compute vector embeddings from a recent deep learning based method (node2vec

[GL16]). We again use all the edge probability models discussed above, and perform an identical experiment (in Figure 1, these are denoted by “n2v”). Again, we observe that the low-degree triangle behavior is not captured by these deep learned embeddings.

The use of geometric embeddings for graph analysis has a rich history, arguably going back to spectral clustering

[Fie73]. In recent years, the Stochastic Block Model has become quite popular in the statistics and algorithms community [HLL83]. and the Random Dot Product Graph model is a generalization of this notion (refer to recent surveys [Abb18, AFL18]). As mentioned earlier, Theorem 1.2 brings into question the standard uses of these methods to model social networks. The use of vectors to represent vertices is sometimes referred to as latent space models, where geometric proximity models the likelihood of an edge. Although dot products are widely used, we note that some classic latent space approaches use Euclidean distance (as opposed to dot product) to model edge probabilities [HRH02], and this may avoid the lower bound of Theorem 1.2. Beyond graph analysis, the method of Latent Semantic Indexing (LSI) also falls in the setting of Theorem 1.2, wherein we have a low-dimensional embedding of “objects” (like documents) and similarity is measured by dot product [lsi19].

## 2 High-level description of the proof

In this section, we sketch the proof of Theorem 1.2. The sketch provides sufficient detail for a reader who wants to understand the reasoning behind our result, but is not concerned with technical details. We will make the simplifying assumption that all s have the same length . We note that this setting is interesting in its own right, since it is often the case in practice that all vectors are non-negative and normalized. In this case, we get a stronger rank lower bound that is linear in . Section 2.1 provides intuition on how we can remove this assumption. The full details of the proof are given in Section 3.

First, we lower bound . By Cauchy-Schwartz, . Let

be the indicator random variable for the edge

being present. Observe that all s are independent and .

The expected number of triangles in is:

 \bf E[∑i≠j≠kXi,jXj,kXi,k] (1) ≤ ∑i∑j,k\bf E[Xj,k]\bf E[Xi,j]%E[Xi,k] (2) ≤ L2∑i∑j,k\bf E[Xi,j]\bf E[Xi,k]=L2∑i(∑j\bf E[Xi,j])2 (3)

Note that is at most the degree of , which is at most . (Technically, the term creates a self-loop, so the correct upper bound is . For the sake of cleaner expressions, we omit the additive in this sketch.)

The expected number of triangles is at least . Plugging these bounds in:

 Δn≤L2c2n⟹L≥√Δ/c (4)

Thus, the vectors have length at least . Now, we lower bound the rank of . It will be convenient to deal with the Gram matrix , which has the same rank as . Observe that . We will use the following lemma stated first by Swanapoel, but has appeared in numerous forms previously [Swa14].

###### Lemma 2.1 (Rank lemma).

Consider any square matrix . Then

 rank(M)≥|∑iMi,i|2(∑i∑j|Mi,j|2)

Note that , so the numerator . The denominator requires more work. We split it into two terms.

 ∑i,j→vi⋅→vj≤1(→vi⋅→vj)2≤∑i,j→vi⋅→vj≤1→vi⋅→vj≤cn (5)

If for , , then is an edge with probability . Thus, there can be at most such pairs. Overall, there are at most pairs such that . So, . Overall, we lower bound the denominator in the rank lemma by .

We plug these bounds into the rank lemma. We use the fact that is decreasing for positive , and that .

 rank(M)≥n2L4cn(L4+1)≥nc⋅Δ2/c4Δ2/c4+1=Δ2c(Δ2+c4)⋅n

### 2.1 Dealing with varying lengths

The math behind Eqn. (4) still holds with the right approximations. Intuitively, the existence of at least triangles implies that a sufficiently large number of vectors have length at least . On the other hand, these long vectors need to be “sufficiently far away” to ensure that the vertex degrees remain low. There are many such long vectors, and they can can only be far away when their dimension/rank is sufficiently high.

The rank lemma is the main technical tool that formalizes this intuition. When vectors are of varying length, the primary obstacle is the presence of extremely long vectors that create triangles. The numerator in the rank lemma sums , which is the length of the vectors. A small set of extremely long vectors could dominate the sum, increasing the numerator. In that case, we do not get a meaningful rank bound.

But, because the vectors inhabit low-dimensional space, the long vectors from different clusters interact with each other. We prove a “packing” lemma (Lemma 3.5) showing that there must be many large positive dot products among a set of extremely long vectors. Thus, many of the corresponding vertices have large degree, and triangles incident to these vertices do not contribute to low degree triangles. Operationally, the main proof uses the packing lemma to show that there are few long vectors. These can be removed without affecting the low degree structure. One can then perform a binning (or “rounding”) of the lengths of the remaining vectors, to implement the proof described in the above section.

## 3 Proof of Theorem 1.2

For convenience, we restate the setting. Consider a set of vectors , that represent the vertices of a social network. We will also use the matrix for these vectors, where each column is one of the s. Abusing notation, we will use to represent both the set of vectors as well as the matrix. We will refer to the vertices by the index in .

Let denote the following distribution of graphs over the vertex set . For each index pair , independently insert (undirected) edge with probability .

### 3.1 The basic tools

We now state some results that will be used in the final proof.

###### Lemma 3.1.

[Rank lemma [Swa14]] Consider any square matrix . Then

 |∑iAi,i|2≤rank(A)(∑i∑j|Ai,j|2)
###### Lemma 3.2.

Consider a set of vectors in .

 ∑(i,j)∈[s]×[s]→wi⋅→wj<0|→wi⋅→wj|≤∑(i,j)∈[s]×[s]→wi⋅→wj>0|→wi⋅→wj|
###### Proof.

Note that . Expand and rearrange to complete the proof. ∎

Recall that an independent set is a collection of vertices that induce no edge.

###### Lemma 3.3.

Any graph with vertices and maximum degree has an independent set of at least .

###### Proof.

Intuitively, one can incrementally build an independent set, by adding one vertex to the set, and removing at most vertices from the graph. This process can be done at least times.

Formally, we prove by induction on . First we show the base case. If , then the statement is trivially true. (There is always an independent set of size .) For the induction step, let us construct an independent set of the desired size. Pick an arbitrary vertex and add it to the independent set. Remove and all of its neighbors. By the induction hypothesis, the remaining graph has an independent set of size at least . ∎

###### Claim 3.4.

Consider the distribution . Let denote the degree of vertex . .

###### Proof.

(of Claim 3.4) Fix any vertex . Observe that , where is the indicator random variable for edge being present. Furthermore, all the s are independent.

 \bf E[D2i] = \bf E[(∑j≠iXj)2]=\bf E[∑j≠iX2j+2∑j≠j′XjXj′] = \bf E[∑j≠iXj]+2∑j≠j′% \bf E[Xj]\bf E[Xj′] ≤ \bf E[Di]+(∑j≠i\bf E[Xj])2=%E[Di]+\bf E[Di]2

A key component of dealing with arbitrary length vectors is the following dot product lemma. This is inspired by results of Alon [Alo03] and Tao [Tao13], who get a stronger lower bound of for absolute values of the dot products.

###### Lemma 3.5.

Consider any set of unit vectors in . There exists some such that .

###### Proof.

(of Lemma 3.5) We prove by contradiction, so assume . We partition the set into and . The proof goes by providing (inconsistent) upper and lower bounds for . First, we upper bound by:

 ≤ ∑(i,j)∈N|→ui⋅→uj|       (→uis are unit vectors) (6) ≤ ∑i≤4d∥→ui∥22+∑1≤i≠j≤4d(i,j)∈P|→ui⋅→uj|       (\hyperref@@ii[lem:neg]{Lemma\,\ref*{lem:neg}}) < 4d+16d2/4d=8d    (since →ui⋅→uj<1/4d)

For the lower bound, we invoke the rank bound of Lemma 3.1 on the Gram matrix of . Note that , , and . By Lemma 3.1, . We bound

 ∑(i,j)∈P|→ui⋅→uj|2 = ∑i≤4d∥→ui∥22+∑(i,j)∈P,i≠j|→ui⋅→uj|2 (7) ≤ 4d+(4d)2/(4d)2≤5d

Thus, . This contradicts the bound of Eqn. (6).

### 3.2 The main argument

We prove by contradiction. We assume that the expected number of triangles contained in the set of vertices of expected degree at most , is at least . We remind the reader that is the total number of vertices. For convenience, we simply remove the vectors corresponding to vertices with expected degree at least . Let be the matrix of the remaining vectors, and we focus on . The expected number of triangles in is at least .

The overall proof can be thought of in three parts.

Part 1, remove extremely long vectors: Our final aim is to use the rank lemma (Lemma 3.1) to lower bound the rank of . The first problem we encounter is that extremely long vectors can dominate the expressions in the rank lemma, and we do not get useful bounds. We show that the number of such long vectors is extremely small, and they can removed without affecting too many triangles. In addition, we can also remove extremely small vectors, since they cannot participate in many triangles.

Part 2, find a “core” of sufficiently long vectors that contains enough triangles: The previous step gets a “cleaned” set of vectors. Now, we bucket these vectors by length. We show that there is a large bucket, with vectors that are sufficiently long, such that there are enough triangle contained in this bucket.

Part 3, apply the rank lemma to the “core”: We now focus on this core of vectors, where the rank lemma can be applied. At this stage, the mathematics shown in Section 2 can be carried out almost directly.

Now for the formal proof. For the sake of contradiction, we assume that (for some sufficiently small constant ).

Part 1: Removing extremely long (and extremely short) vectors

We begin by showing that there cannot be many long vectors in .

###### Lemma 3.6.

There are at most vectors of length at least .

###### Proof.

Let be the set of “long” vectors, those with length at least . Let us prove by contradiction, so assume there are more than long vectors. Consider a graph , where vectors () are connected by an edge if . We choose the bound to ensure that all edges in are edges in .

Formally, for any edge in , . So is an edge with probability in . The degree of any vertex in is at most . By Lemma 3.3, contains an independent set of size at least . Consider an arbitrary sequence of (normalized) vectors in . Applying Lemma 3.5 to this sequence, we deduce the existence of in () such that . Then, the edge should be present in , contradicting the fact that is an independent set. ∎

Denote by the set of all vectors in with length in the range .

###### Claim 3.7.

The expected degree of every vertex in is at most , and the expected number of triangles in is at least .

###### Proof.

Since removal of vectors can only decrease the degree, the expected degree of every vertex in is naturally at most . It remains to bound the expected number of triangles in . By removing vectors in , we potentially lose some triangles. Let us categorize them into those that involve at least one “long” vector (length ) and those that involve at least one “short” vector (length ) but no long vector.

We start with the first type. By Lemma 3.6, there are at most long vectors. For any vertex, the expected number of triangles incident to that vertex is at most the expected square of the degree. By Claim 3.4, the expected degree squares is at most . Thus, the expected total number of triangles of the first type is at most .

Consider any triple of vectors where is short and neither of the others are long. The probability that this triple forms a triangle is at most

 min(→u⋅→v,1)⋅min(→u⋅→w,1) ≤ min(∥→u∥2∥→v∥2,1)⋅min(∥→u∥2∥→w∥2,1) ≤ (n−2⋅2√n)2≤4n−3

Summing over all such triples, the expected number of such triangles is at most .

Thus, the expected number of triangles in is at least . ∎

Part 2: Finding core of sufficiently long vectors with enough triangles

For any integer , let be the set of vectors . Observe that the s form a partition of . Since all lengths in are in the range , there are at most non-empty s. Let be the set of indices such that . Furthermore, let be .

###### Claim 3.8.

The expected number of triangles in is at least .

###### Proof.

The total number of vectors in is at most . By Claim 3.4 and linearity of expectation, the expected sum of squares of degrees of all vectors in is at most . Since the expected number of triangles in is at least (Claim 3.7) and the expected number of triangles incident to vectors in is at most , the expected number of triangles in is at least . ∎

We now come to an important claim. Because the expected number of triangles in is large, we can prove that must contain vectors of at least constant length.

.

###### Proof.

Suppose not. Then every vector in has length at most . By Cauchy-Schwartz, for every pair , . Let denote the set of vector indices in (this corresponds to the vertices in ). For any two vertices , let be the indicator random variable for edge being present. The expected number of triangles incident to vertex in is

 \bf E[∑j≠k∈IXi,jXi,kXj,k]=∑j≠k∈I\bf E[Xi,jXi,k]\bf E[Xj,k]

Observe that is at most . Furthermore, (recall that is the degree of vertex .) By Claim 3.4, this is at most . The expected number of triangles in is at most . This contradicts Claim 3.8. ∎

Part 3: Applying the rank lemma to the core

We are ready to apply the rank bound of Lemma 3.1 to prove the final result. The following lemma contradicts our initial bound on the rank , completing the proof. We will omit some details in the following proof, and provide a full proof in the SI.

.

###### Proof.

It is convenient to denote the index set of be . Let be the Gram Matrix , so for , By Lemma 3.1, . Note that is , which is at least for . Let us denote by , so all vectors in have length at most . By Cauchy-Schwartz, all entries in are at most .

We lower bound the numerator.

 (∑i∈I∥→vi∥22)2≥(∑r∈R22r|Vr|)2 ≥ (maxr∈R22r(Δ/60c2)(n/lgn))2 = L4(Δ2/3600c4)(n2/lg2n)

Now for the denominator. We split the sum into four parts and bound each separately.

 ∑i,j∈I|Mi,j|2 = ∑i∈I|Mi,i|2+∑i,j∈Ii≠j,Mi,j∈[0,1]|Mi,j|2 + ∑i,j∈Ii≠j,Mi,j>1|Mi,j|2+∑i,j∈IMi,j<0|Mi,j|2:!

Since , the first term is at most . For and , the probability that edge is present is precisely . Thus, for the second term,

 ∑i,j∈Ii≠j,Mi,j∈[0,1]|Mi,j|2≤∑i,j∈Ii≠j,Mi,j∈[0,1]Mi,j≤2cn (8)

For the third term, we observe that when (for ), then is an edge with probability . There can be at most pairs , , such that . Thus, the third term is at most .

Now for the fourth term. Note that is a Gram matrix, so we can invoke Lemma 3.2 on its entries.

 ∑i,j∈IMi,j<0|Mi,j|2 ≤ L2∑i,j∈IMi,j<0|Mi,j| (9) ≤ L2(∑i∈I|Mi,i|+∑i,j∈IMi,j>0|Mi,j|) ≤ 4nL4+L2∑i,j∈IMi,j∈[0,1]|Mi,j|+4L4∑i,j∈IMi,j>11 ≤ 4nL4+2cnL2+8cnL4

Putting all the bounds together, we get that . If , we can upper bound by . If , we can upper bound by . In either case, is a valid upper bound.

Crucially, by Claim 3.9, . Thus, . Combining all the bounds (and setting ),

 rank(V′′) ≥ L4(Δ2/3600c4)(n2/lg2n)128cn(1+16L4) ≥ L4(Δ2/3600c4)(n/lg2n)128cn(44c4L4/Δ2+16L4) ≥ (αΔ4/c9)(n/lg2n)

## 4 Details of empirical results

Data Availability: The datasets used are summarized in Tab. 4. We present here four publicly available datasets from different domains. The ca-HepPh is a co-authorship network Facebook is a social network, cit-HepPh is a citation network, all obtained from the SNAP graph database [SNA19]. The String_hs dataset is a protein-protein-interaction network obtained from [str19]. (The citations provide the link to obtain the corresponding datasets.)

We first describe the primary experiment, used to validate Theorem 1.2 on the SVD embedding. We generated a -dimensional embedding for various values of using the SVD. Let be a graph with the (symmetric) adjacency matrix , with eigendecomposition . Let be the matrix with the diagonal matrix with the

largest magnitude eigenvalues of

along the diagonal. Let be the

matrix with the corresponding eigenvectors as columns. We compute the matrix

and refer to this as the spectral embedding of . This is the standard PCA approach.

From the spectral embeddings, we generate a graph from by considering every pair of vertices and generate a random value in . If the entry of is greater than the random value generated, the edge is added to the graph. Otherwise the edge is not present. This is the same as taking and setting all negative values to 0, and all values greater than 1 to 1 and performing Bernoulli trials for each edge with the resulting probabilities. In all the figures, this is referred to as the “SVD TDP” (truncated dot product) embedding.

### 4.1 Triangle distributions

To generate Figure 1 and Figure 4, we calculated the number of triangles incident to vertices of different degrees in both the original graphs and the graphs generated from the embeddings. Each of the plots shows the number of triangles in the graph on the vertical axis and the degrees of vertices on the horizontal axis. Each curve corresponds to some graph, and each point in a given curve shows that the graph contains triangles if we remove all vertices with degree at least . We then generate 100 random samples from the 100-dimensional embedding, as given by SVD (described above). For each value of , we plot the maximum value of over all the samples. This is to ensure that our results are not affected by statistical variation (which was quite minimal).

### 4.2 Alternate graph models

We consider three other functions of the dot product, to construct graph distributions from the vector embeddings. Details on parameter settings and the procedure used for the optimization are given in the SI.

Logistic Regression on the Dot Product (LRDP): We consider the probability of an edge to be the logistic function , where are parameters. Observe that the range of this function is , and hence can be interpreted as a probability. We tune these parameters to fit the expected number of edges, to the true number of edges. Then, we proceed as in the TDP experiment. We note that the TDP can be approximated by a logistic function, and thus the LRDP embedding is a “closer fit” to the graph than the TDP embedding.

Logistic Regression on the Hadamard Product (LRHP): This is inspired by linear models used on low-dimensional embeddings [HYL17]. Define the Hadamard product to be the -dimensional vector where the th coordinate is the product of th coordinates. We now fit a logistic function over linear functions of (the coordinates of) . This is a significantly richer model than the previous model, which uses a fixed linear function (sum). Again, we tune parameters to match the number of edges. The fitting of LRDP and LRHP was done using the Matlab function glmfit (Generalized Linear Model Regression Fit) [mat18]. The distribution parameter was set to “binomial”, since the total number of edges is distributed as a weighted binomial.

Softmax: This is inspired by low-dimensional embeddings for random walk matrices [PARS14, GL16]. The idea is to make the probability of edge to be proportional to softmax, . This tends to push edge formation even for slightly higher dot products, and one might imagine this helps triangle formation. We set the proportionality constant separately for each vertex to ensure that the expected degree is the true degree. The probability matrix is technically undirected, but we symmetrize the matrix.

node2vec experiments: We also applied node2vec, a recent deep learning based graph embedding method [GL16], to generate vector representations of the vertices. We used the optimized C++ implementation [n2v18a] for node2vec, which is equivalent to the original implementation provided by the authors [n2v18b]. For all our experiments, we use the default settings of walk length of 80, 10 walks per node, p=1 and q=1. The node2vec algorithm tries to model the random walk matrix associated with a graph, not the raw adjacency matrix. The dot products between the output vectors are used to model the random walk probability of going from to , rather than the presence of an edge. It does not make sense to apply the TDP function on these dot products, since this will generate (in expectation) only edges (one for each vertex). We apply the LRDP or LRHP functions, which use the node2vec vectors as inputs to a machine learning model that predicts edges.

In Figures 1 and 4, we show results for all the datasets. We note that for all datasets and all embeddings, the models fail to capture the low-degree triangle behavior.

### 4.3 Degree distributions

We observe that the low-dimensional embeddings obtained from SVD and the truncated dot product can capture the degree distribution accurately. In Figure 4, we plot the degree distribution (in loglog scale) of the original graph with the expected degree distribution of the embedding. For each vertex , we can compute its expected degree by the sum , where is the probability of the edge . In all cases, the expected degree distributions is close to the true degree distributions, even for lower degree vertices. The embedding successfully captures the “first order” connections (degrees), but not the higher order connections (triangles). We believe that this reinforces the need to look at the triangle structure to discover the weaknesses of low-dimensional embeddings.

### 4.4 Detailed relationship between rank and triangle structure

For the smallest Facebook graph, we were able to compute the entire set of eigenvalues. This allows us to determine how large a rank is required to recreate the low-degree triangle structure. In Figure 4.4, for varying rank of the embedding, we plot the corresponding triangle distribution. In this plot, we choose the embedding given by the eigendecomposition (rather than SVD), since it is guaranteed to converge to the correct triangle distribution for an -dimensional embedding ( is the number of vertices). The SVD and eigendecomposition are mostly identical for large singular/eigenvalues, but tend to be different (up to a sign) for negative eigenvalues.

We observe that even a 1000 dimensional embedding does not capture the vs plots for low degree. Even the rank 2000 embedding is off the true values, though it is correct to within an order of magnitude. This is strong corroboration of our main theorem, which says that near linear rank is needed to match the low-degree triangle structure.

## References

• [Abb18] Emmanuel Abbe. Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18:1–86, 2018.
• [AFL18] Avanti Athreya, Donniell E. Fishkind, Keith Levin, Vince Lyzinski, Youngser Park, Yichen Qin, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein, and Carey E. Priebe. Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18:1–92, 2018.
• [Alo03] N. Alon. Problems and results in extremal combinatorics, part i, discrete math. Discrete Math, 273:31–53, 2003.
• [Arn19] Citation network dataset. https://aminer.org/citation, 2019.
• [ASN13] Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. Distributed large-scale natural graph factorization. In Conference on World Wide Web, pages 37–48, 2013.
• [BA99] A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
• [BCG10] Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. PVLDB, 4(3):173–184, 2010.
• [CF06] D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Computing Surveys, 38(1), 2006.
• [CLX16] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Deep neural networks for learning graph representations. In

AAAI Conference on Artificial Intelligence

, pages 1145–1152, 2016.
• [DPKS12] N. Durak, A. Pinar, T. G. Kolda, and C. Seshadhri. Degree relations of triangles in real-world networks and graph models. In Conference on Information and Knowledge Management (CIKM), 2012.
• [EK10] D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World. Cambridge University Press, 2010.
• [Fie73] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23(98):298––305, 1973.
• [GGL13] Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. Wtf: The who to follow service at twitter. In Conference on World Wide Web, pages 505–514, 2013.
• [GL16] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In SIGGKDD Conference of Knowledge Discovery and Data Mining, pages 855–864. ACM, 2016.
• [HLL83] P. W. Holland, K. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5:109–137, 1983.
• [HRH02] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090–1098, 2002.
• [HYL17] William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Neural Information Processing Systems, NIPS’17, pages 1025–1035, USA, 2017. Curran Associates Inc.
• [KUK17] Isabel M. Kloumann, Johan Ugander, and Jon Kleinberg. Block models and personalized pagerank. Proceedings of the National Academy of Sciences (PNAS), 114(1):33–38, 2017.
• [lsi19] Latent semantic analysis.
• [mat18] Matlab glmfit function.
• [n2v18a] Node2vec c++ code.
• [n2v18b] Node2vec code.
• [New03] M. E. J. Newman. The structure and function of complex networks. SIAM REVIEW, 45:167–256, 2003.
• [PARS14] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In SIGGKDD Conference of Knowledge Discovery and Data Mining, pages 701–710, 2014.
• [SCW10] A. Sala, L. Cao, C. Wilson, R. Zablit, H. Zheng, and B. Y. Zhao. Measurement-calibrated graph models for social network experiments. In Conference on World Wide Web, pages 861–870, 2010.
• [SKP12] C. Seshadhri, Tamara G. Kolda, and Ali Pinar. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E, 85(5):056109, May 2012.
• [SNA19] SNAP. Stanford Network Analysis Project, 2019. Available at http://snap.stanford.edu/.
• [SSSG20] C. Seshadhri, Aneesh Sharma, Andrew Stolman, and Ashish Goel. The impossibility of low-rank representations for triangle-rich complex networks. Proceedings of the National Academy of Science (PNAS), 117(11):5631–5637, 2020.
• [str19] String database.
• [Swa14] K. Swanapoel. The rank lemma.
• [Tao13] T. Tao. A cheap version of the kabatjanskii-levenstein bound for almost orthogonal vectors.
• [TQW15] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In Conference on World Wide Web, pages 1067–1077, 2015.