1 Introduction
Technological developments in the 21st century have given rise to largescale social networks, such as the graphs defined by Facebook friendship relationships or followers on Twitter. Such networks arguably provide the most important new application domain for graph analysis in well over a decade.
1.1 Social Networks Have Special Structure
There is wide consensus that social networks have predictable structure and features, and accordingly are not well modeled by arbitrary graphs. From a structural viewpoint, the most well studied and empirically validated properties of social networks are:

A heavytailed degree distribution, such as a powerlaw distribution.

Triadic closure, meaning that pairs of vertices with a common neighbor tend to be directly connected—that friends of friends tend to be friends in their own right.

The presence of “communitylike structures,” meaning subgraphs that are much more richly connected internally than externally.

The smallworld property, meaning that it’s possible to travel from any vertex to any other vertex using remarkably few hops.
These properties are not generally possessed by ErdősRényi random graphs (in which each edge is present independently with some probability ); a new model is needed to capture them.
From an algorithmic standpoint, empirical results indicate that optimization problems are often easier to solve in social networks than in worstcase graphs. For example, lightweight heuristics are unreasonably effective in practice for finding the maximum clique or recovering dense subgraphs of a large social network.
The literature on models that capture the special structure of social networks is almost entirely driven by the quest for generative (i.e., probabilistic) models that replicate some or all of the four properties listed above. Dozens of generative models have been proposed, and there is little consensus about which is the “right” one. The plethora of models poses a challenge to meaningful theoretical work on social networks—which of the models, if any, is to be believed? How can we be sure that a given algorithmic or structural result is not an artifact of the model chosen?
This chapter surveys recent research on more robust models of largescale social networks, which assume deterministic combinatorial properties rather than a specific generative model. Structural and algorithmic results that rely only on these deterministic properties automatically carry over to any generative model that produces graphs possessing these properties (with high probability). Such results effectively apply “in the worst case over all plausible generative models.” This hybrid of worstcase (over input distributions) and averagecase (with respect to the distribution) analysis resembles several of the semirandom models discussed elsewhere in the book, such as in the preceding chapters on pseudorandom data (Chapter 26) and priorindependent auctions (Chapter 27).
2 Cliques of Closed Graphs
2.1 Triadic Closure
Triadic closure is the property that, when two members of a social network have a friend in common, they are likely to be friends themselves. In graphtheoretic terminology, twohop paths tend to induce triangles.
Triadic closure has been studied for decades in the social sciences and there is compelling intuition for why social networks should exhibit strong triadic closure properties. Two people with a common friend are much more likely to meet than two arbitrary people, and are likely to share common interests. They might also feel pressure to be friends to avoid imposing stress on their relationships with their common friend.
The data support this intuition. Numerous largescale studies on online social networks provide overwhelming empirical evidence for triadic closure. The plot in Figure 1, derived from the network of email communications at the disgraced energy company Enron, is representative. Other social networks exhibit similar triadic closure properties.
2.2 Closed Graphs
The most extreme version of triadic closure would assert that whenever two vertices have a common neighbor, they are themselves neighbors: whenever and are in the edge set , so is . The class of graphs satisfying this property is not very interesting—it is precisely the (vertex)disjoint unions of cliques—but it forms a natural base case for more interesting parameterized definitions.^{1}^{1}1Recall that a clique of a graph is a subset of vertices that are fully connected, meaning that for every pair of distinct vertices of .
Our first definition of a class of graphs with strong triadic closure properties is that of closed graphs.
Definition 2.1 (Fox et al. (2020)).
For a positive integer , a graph is closed if, whenever have at least common neighbors, .
For a fixed number of vertices, the parameter interpolates between unions of cliques (when ) and all graphs (when ). The class of 2closed graphs—the graphs that do not contain a square (i.e., ) or a diamond (i.e., minus an edge) as an induced subgraph—is already nontrivial. The closed condition is a coarse proxy for the empirical closure rates observed in social networks (like in Figure 1), asserting that the closure rate jumps to 100% for vertices with or more common neighbors.
Next is a less stringent version of the definition, which is sufficient for the main algorithmic result of this section.
Definition 2.2 (Fox et al. (2020)).
For a positive integer , a vertex of a graph is good if whenever has at least common neighbors with another vertex , . The graph is weakly closed if every induced subgraph has at least one good vertex.
A closed graph is also weakly closed, as each of its vertices is good in each of its induced subgraphs. The converse is false; for example, a path graph is not 1closed, but it is weakly 1closed (as the endpoints of a path are 1good). Equivalent to Definition 2.2 is the condition that the graph has an elimination ordering of good vertices, meaning the vertices can be ordered such that, for every , the vertex is good in the subgraph induced by (Exercise 1). Are realworld social networks closed or weakly closed for reasonable values of ? The next table summarizes some representative numbers.
weak  
emailEnron  36692  183831  161  34 
p2pGnutella04  10876  39994  24  8 
wikiVote  7115  103689  420  42 
caGrQc  5242  14496  41  9 

These social networks are closed for much smaller values of than the trivial bound of , and are weakly closed for quite modest values of .
2.3 Computing a Maximum Clique: A Backtracking Algorithm
Once a class of graph has been defined, such as closed graphs, a natural agenda is to investigate fundamental optimization problems with graphs restricted to the class. We single out the problem of finding the maximumsize clique of a graph, primarily because it is one of the most central problems in social network analysis. In a social network, cliques can be interpreted as the most extreme form of a community.
The problem of computing the maximum clique of a graph reduces to the problem of enumerating the graph’s maximal cliques^{2}^{2}2A maximal clique is a clique that is not a strict subset of another clique.—the maximum clique is also maximal, so it appears as the largest of the cliques in the enumeration.
How does the closed condition help with the efficient computation of a maximum clique? We next observe that the problem of reporting all maximal cliques is polynomialtime solvable in closed graphs when is a fixed constant. The algorithm is based on backtracking. For convenience, we give a procedure that, for any vertex , identifies all maximal cliques that contain . (The full procedure loops over all vertices.)

Maintain a history , initially empty.

Let denote the vertex set comprising and all vertices that are adjacent to both and all vertices in .

If is a clique, report the clique and return.

Otherwise, recurse on each vertex with history .
This subroutine reports all maximal cliques that contain , whether the graph is closed or not (Exercise 2). In a closed graph, the maximum depth of the recursion is —once , every pair of vertices in has common neighbors (namely ) and hence must be a clique. The running time of the backtracking algorithm is therefore in closed graphs.
This simplistic backtracking algorithm is extremely slow except for very small values of . Can we do better?
2.4 Computing a Maximum Clique: FixedParameter Tractability
There is a simple but clever algorithm that, for an arbitrary graph, enumerates all of the maximal cliques while using only polynomial time per clique.
Theorem 2.1 (Tsukiyama et al. (1977)).
There is an algorithm that, given any input graph with vertices and edges, outputs all of the maximal cliques of the graph in time per maximal clique.
Theorem 2.1 reduces the problem of enumerating all maximal cliques in polynomial time to the combinatorial task of proving a polynomial upper bound on the number of maximal cliques.
Computing a maximum clique of an arbitrary graph is an hard problem, so presumably there exist graphs with an exponential number of maximal cliques. The MoonMoser graphs are a simple and famous example. For a multiple of 3, the MoonMoser graph with vertices is the perfectly balanced tite graph, meaning the vertices are partitioned into groups of 3, and every vertex is connected to every other vertex except for the 2 vertices in the same group (Figure 2).
Choosing one vertex from each group induces a maximal clique, for a total of maximal cliques, and these are all of the maximal cliques of the graph. More generally, a basic result in graph theory asserts that no vertex graph can have more than maximal cliques.
Theorem 2.2 (Moon and Moser (1965)).
Every vertex graph has at most maximal cliques.
A MoonMoser graph on vertices is not closed even for , so there remains hope for a positive result for closed graphs with small . The MoonMoser graphs do show that the number of maximal cliques of a closed graph can be exponential in (since a MoonMoser graph on vertices is trivially closed). Thus the bestcase scenario for enumerating the maximal cliques of a closed graph is a fixedparameter tractability result (with respect to the parameter ), stating that, for some function and constant (independent of ), the number of maximal cliques in an vertex closed graph is . The next theorem shows that this is indeed the case, even for weakly closed graphs.
Theorem 2.3 (Fox et al. (2020)).
Every weakly closed graph with vertices has at most
maximal cliques.
Corollary 2.3.1.
The maximum clique problem is polynomialtime solvable in weakly closed vertex graphs with .
2.5 Proof of Theorem 2.3
The proof of Theorem 2.3 proceeds by induction on the number of vertices . (One of the factors of in the bound is from the steps in this induction.) Let be an vertex weakly closed graph. Assume that ; otherwise, the bound is trivial.
By assumption, has a good vertex . By induction, has at most maximal cliques. (An induced subgraph of a weakly closed graph is again weakly closed.) Every maximal clique of gives rise to a unique maximal clique in (namely or , depending on whether the latter is a clique). It remains to bound the number of uncounted maximal cliques of , meaning the maximal cliques of for which is not maximal in .
An uncounted maximal clique must include , with contained in ’s neighborhood (i.e., in the subgraph induced by and the vertices adjacent to it). Also, there must be a vertex such that is a clique in ; we say that is a witness for , as it certifies the nonmaximality of in . Such a witness must be connected to every vertex of . It cannot be a neighbor of , as otherwise would be a clique in , contradicting ’s maximality.
Choose an arbitrary witness for each uncounted clique of and bucket these cliques according to their witness; recall that all witnesses are nonneighbors of . For every uncounted clique with witness , all vertices of the clique are connected to both and . Moreover, because is a maximal clique in , is a maximal clique in the subgraph induced by the common neighbors of and .
How big can such a subgraph be? This is the step of the proof where the weakly closed condition is important: Because is a nonneighbor of and is a good vertex, and have at most common neighbors and hence has at most vertices (Figure 3). By the MoonMoser theorem (Theorem 2.2), each subgraph has at most maximal cliques. Adding up over the at most choices for , the number of uncounted cliques is at most ; this sum over possible witnesses is the source of the second factor of in Theorem 2.3. Combining this bound on the uncounted cliques with the inductive bound on the remaining maximal cliques of yields the desired upper bound of
3 The Structure of TriangleDense Graphs
3.1 TriangleDense Graphs
Our second graph class inspired by the strong triadic closure properties of social and information networks is the class of triangledense graphs. These are graphs where a constant fraction of vertex pairs having at least one common neighbor are directly connected by an edge. Equivalently, a constant fraction of the wedges (i.e., twohop paths) of the graph belong to a triangle.
Definition 3.1 (Gupta et al. (2016)).
The triangle density of an undirected graph is , where and denote the number of triangles and wedges of , respectively. (We define if .) The class of triangledense graphs consists of the graphs with .
(In the social networks literature, this is also called the transitivity or the global clustering coefficient.) Because every triangle of a graph contains 3 wedges, and no two triangles share a wedge, the triangle density of a graph is between 0 and 1—the fraction of wedges that belong to a triangle. Triangle density is another coarse proxy for the empirical closure rates observed in social networks (like in Figure 1(a)).
The 1triangledense graphs are precisely the unions of disjoint cliques, while trianglefree graphs constitute the 0triangledense graphs. The triangle density of an ErdősRényi graph with edge probability is concentrated around (cf., Figure 1(b)). For an ErdősRényi graph to have constant triangle density, one would need to set . This would imply that the graph is dense, quite unlike social networks. For example, in the year 2011 the triangle density of the Facebook graph was computed to be , which is five orders of magnitude larger than in a random graph with the same number of vertices (roughly 1 billion at the time) and edges (roughly 100 billion).
3.2 Visualizing TriangleDense Graphs
What do triangledense graphs look like? Can we make any structural assertions about them, akin to separator theorems for planar graphs (allowing them to be viewed as “approximate grids”) or the regularity lemma for dense graphs (allowing them to viewed as approximate unions of random bipartite graphs)?
Given that 1triangledense graphs are unions of cliques, a first guess might be that triangledense graphs look like the approximate union of approximate cliques (as in Figure 4(a)). Such graphs certainly have high triangle density; could there be an “inverse theorem,” stating that these are in some sense the only graphs with this property?
In its simplest form, the answer to this question is “no,” as triangledense graphs become quite diverse once is bounded below 1. For example, adding a clique on vertices to an arbitrary boundeddegree vertex graph produces a triangledense graph with as (see Figure 4(b)).
Nonetheless, an inverse theorem does hold if we redefine what it means to approximate a graph by a collection of approximate cliques. Instead of trying to capture most of the vertices or edges (which is impossible, as the previous example shows), we consider the goal of capturing a constant fraction of the triangles of a graph by a collection of dense subgraphs.
3.3 An Inverse Theorem
To state an inverse theorem for triangledense graphs, we require a preliminary definition.
Definition 3.2 (Tightly Knit Family).
Let . A collection of disjoint sets of vertices of a graph forms a tightlyknit family if:

For each , the subgraph induced by has at least edges and triangles. (That is, a fraction of the maximum possible edges and triangles.)

For each , the subgraph induced by has radius at most .
In Definition 3.2, the vertex sets are disjoint but need not cover all of ; in particular, the empty collection is technically a tightly knit family.
The following inverse theorem states that every triangledense graph contains a tightlyknit family that captures most of the “meaningful social structure”—a constant fraction of the graph’s triangles.
Theorem 3.1 (Gupta et al. (2016)).
There is a function such that for every triangle dense graph , there exists an tightlyknit family that contains an fraction of the triangles of .
3.4 Proof Sketch of Theorem 3.1
The proof of Theorem 3.1 is constructive, and interleaves two subroutines. To state the first, define the Jaccard similarity of an edge of a graph as the fraction of neighbors of and that are neighbors of both:
where denotes the neighbors of a vertex and the “2” is to avoid counting and themselves. The first subroutine, called the cleaner, is given a parameter as input and repeatedly deletes edges with Jaccard similarity less than until none remain. Removing edges from the graph is worrisome because it removes triangles, and Theorem 3.1 promises that the final tightly knit family captures a constant fraction of the original graph’s triangles. But removing an edge with low Jaccard similarity destroys many more wedges than triangles, and the number of triangles in the graph is at least a constant fraction of the number of wedges (because it is triangledense). A charging argument along these lines shows that, provided is at most , the cleaner cannot destroy more than a constant fraction of the graph’s triangles.
The second subroutine, called the extractor, is responsible for extracting one of the clusters of the tightlyknit family from a graph in which all edges have Jaccard similarity at least . (Isolated vertices can be discarded from further consideration.) How is this Jaccard similarity condition helpful? One easy observation is that, postcleaning, the graph is “approximately locally regular,” meaning that the endpoints of any edge have degrees within a factor of each other. Starting from this fact, easy algebra shows that every onehop neighborhood of the graph (i.e., the subgraph induced by a vertex and its neighbors) has constant (depending on ) density in both edges and triangles, as required by Theorem 3.1. The bad news is that extracting a onehop neighborhood can destroy almost all of a graph’s triangles (Exercise 4). The good news is that supplementing a onehop neighborhood with a judiciously chosen subset of the corresponding twohop neighborhood (i.e., neighbors of neighbors) fixes the problem. Precisely, the extractor subroutine is given a graph in which every edge has Jaccard similarity at least and proceeds as follows:

Let be a vertex of with the maximum degree. Let denote ’s degree and its neighbors.

Calculate a score for every vertex outside equal to the number of triangles that include and two vertices of . In other words, is the number of triangles that would be saved by supplementing the onehop neighborhood by . (On the flip side, this would also destroy the triangles that contain and two vertices outside .)

Return the union of , , and up to vertices outside with the largest nonzero scores.
It is clear that the extractor outputs a set of vertices that induces a subgraph with radius at most 2. As with onehop neighborhoods, easy algebra shows that, because every edge has Jaccard similarity at least , this subgraph is dense in both edges and triangles. The important nonobvious fact, whose proof is omitted here, is that the number of triangles saved by the extractor (i.e., triangles with all three vertices in its output) is at least a constant fraction of the number of triangles it destroys (i.e., triangles with one or two vertices in its output). It follows that alternating between cleaning and extracting (until no edges remain) will produce a tightlyknit family meeting the promises of Theorem 3.1.
4 PowerLaw Bounded Networks
Arguably the most famous property of social and information networks, even more so than triadic closure, is a powerlaw degree distribution, also referred to as a heavytailed or scalefree degree distribution.
4.1 PowerLaw Degree Distributions and Their Properties
Consider a simple graph with vertices. For each positive integer , let denote the number of vertices of with degree . The sequence is called the degree distribution of . Informally, a degree distribution is said to be a powerlaw with exponent if scales as .
There is some controversy about how to best fit powerlaw distributions to data, and whether such distributions are the “right” fit for the degree distributions in realworld social networks (as opposed to, say, lognormal distributions). Nevertheless, several of the consequences of a powerlaw degree distribution assumption are uncontroversial for social networks, and so a powerlaw distribution is a reasonable starting point for mathematical analysis.
This section studies the algorithmic benefits of assuming that a graph has an (approximately) powerlaw degree distribution, in the form of fast algorithms for fundamental graph problems. To develop our intuition about such graphs, let’s do some rough calculations under the assumption that (for some constant ) for every up to the maximum degree ; think of as for some constant .
First, we have the implication
(1) 
When , is a divergent series. In this case, we cannot satisfy the righthand side of (1) with a constant . For this reason, results on powerlaw degree distributions typically assume that .
Next, the number of edges is exactly
(2) 
Thus, up to constant factors, is the average degree. For , is a convergent series, and the graph has constant average degree. For this reason, much of the early literature on graphs with powerlaw degree distributions focused on the regime where . When , the average degree scales with , and for , it scales with , which is polynomial in .
One of the primary implications of a powerlaw degree distribution is upper bounds on the number of highdegree vertices. Specifically, under our assumption that , the number of vertices of degree at least can be bounded by
(3) 
4.2 PLB Graphs
The key definition in this section is a more plausible and robust version of the assumption that , for which the conclusions of calculations like those in Section 4.1 remain valid. The definition allows individual values of to deviate from a true power law, while requiring (essentially) that the average value of in sufficiently large intervals of does follow a power law.
Definition 4.1 (Berry et al. (2015); Brach et al. (2016)).
A graph with degree distribution is a powerlaw bounded (PLB) graph with exponent if there is a constant such that
for all .
Many realworld social networks satisfy a mild generalization of this definition, in which is allowed to scale with for a “shift” ; see the Notes for details. For simplicity, we continue to assume in this section that .
Definition 4.1 has several of the same implications as a pure power law assumption, including the following lemma (cf. (2)).
Lemma 4.1.
Suppose is a PLB graph with exponent . For every and natural number ,
The proof of Lemma 4.1 is technical but not overly difficult; we do not discuss the details here.
The first part of the next lemma provides control over the number of highdegree vertices and is the primary reason why many graph problems are more easily solved on PLB graphs than on general graphs. The second part of the lemma bounds the number of wedges of the graph when .
Lemma 4.2.
Suppose is a PLB graph with exponent . Then:

.

Let denote the number of wedges (i.e., twohop paths). If , . If , .
4.3 Counting Triangles
Many graph problems appear to be easier in PLB graphs than in general graphs. To illustrate this point, we single out the problem of triangle counting, which is one of the most canonical problems in social network analysis. For this section, we assume that our algorithms can determine in constant time if there is an edge between a given pair of vertices; these lookups can be avoided with a careful implementation (Exercise 9), but such details distract from the main analysis.
As a warm up, consider the following trivial algorithm to count (three times) the number of triangles of a given graph (“Algorithm 1”):

For every vertex of :

For every pair of ’s neighbors, check if , , and form a triangle.

Note that the running time of Algorithm 1 is proportional to the number of wedges in the graph . The following running time bound for triangle counting in PLB graphs is an immediate corollary of Lemma 4.2(b), applied to Algorithm 1.
Corollary 4.0.1.
Triangle counting in vertex PLB graphs with exponent can be carried out in time. If the exponent is strictly greater than , it can be done in time.
Now consider an optimization of Algorithm 1 (“Algorithm 2”):

Direct each edge of from the lowerdegree endpoint to the higherdegree endpoint (breaking ties lexicographically) to obtain a directed graph .

For every vertex of :

For every pair of ’s outneighbors, check if , , and form a triangle in .

Each triangle is counted exactly once by Algorithm 2, in the iteration where the lowestdegree of its three vertices plays the role of . Remarkably, this simple idea leads to massive time savings in practice.
A classical way to capture this running time improvement mathematically is to parameterize the input graph by its degeneracy, which can be thought of as a refinement of the maximum degree. The degeneracy of a graph can be computed by iteratively removing a minimumdegree vertex (updating the vertex degrees after each iteration) until no vertices remain; is then the largest degree of a vertex at the time of its removal. (For example, every tree has degeneracy equal to 1.) We have the following guarantee for Algorithm 2, parameterized by a graph’s degeneracy:
Theorem 4.1 (Chiba and Nishizeki (1985)).
For every graph with edges and degeneracy , the running time of Algorithm 2 is .
Every PLB graph with exponent has degeneracy ; see Exercise 11. For PLB graphs with , we can apply Lemma 4.1 with to obtain and hence the running time of Algorithm 2 is .
Our final result for PLB graphs improves this running time bound, for all , through a more refined analysis.^{3}^{3}3The running time bound actually holds for all , but is an improvement only for .
Theorem 4.2 (Brach et al. (2016)).
In PLB graphs with exponent , Algorithm 2 runs in time.
Proof.
Let denote an vertex PLB graph with exponent . Denote the degree of vertex in by and its outdegree in the directed graph by . The running time of Algorithm 2 is , so the analysis boils down to bounding the outdegrees in . One trivial upper bound is for every . Because every edge is directed from its lowerdegree endpoint to its higherdegree endpoint, we also have . By Claim 4.2(a), the second bound is . The second bound is better than the first roughly when , or equivalently when .
Let denote the set of degree vertices of . We split the sum over vertices according to how their degrees compare to , using the first bound for lowdegree vertices and the second bound for highdegree vertices:
Applying Lemma 4.1 (with ) to the sum over lowdegree vertices, and using the fact that with the sum is divergent, we derive
The same reasoning shows that Algorithm 2 runs in time in vertex PLB graphs with exponent , and in time in PLB graphs with (Exercise 14).
4.4 Discussion
Beyond triangle counting, which computational problems should we expect to be easier on PLB graphs than on general graphs? A good starting point is problems that are relatively easy on boundeddegree graphs. In many cases, fast algorithms for boundeddegree graphs remain fast for graphs with bounded degeneracy. In these cases, the degeneracy bound for PLB graphs (Exercise 11) can already lead to fast algorithms for such graphs. For example, this approach can be used to show that all of the cliques of a PLB graph with exponent can be enumerated in subexponential time (see Exercise 15). In some cases, like in Theorem 4.2, one can beat the bound from the degeneracybased analysis through more refined arguments.
5 The BCT Model
This section gives an impressionistic overview of another set of deterministic conditions meant to capture properties of “typical networks,” proposed by Borassi et al. (2017) and hereafter called the BCT model. The precise model is technical with a number of parameters; we give only a highlevel description that ignores several complications.
To illustrate the main ideas, consider the problem of computing the diameter of an undirected and unweighted vertex graph , where denotes the shortestpath distance between and in . Define the eccentricity of a vertex by , so that the diameter is the maximum eccentricity. The eccentricity of a single vertex can be computed in linear time using breadthfirst search, which gives a quadratictime algorithm for computing the diameter. Despite much effort, no subquadratic approximation algorithm for computing the graph diameter is known for general graphs. Yet there are many heuristics that perform well in realworld networks. Most of these heuristics compute the eccentricities of a carefully chosen subset of vertices. An extreme example is the TwoSweep algorithm:

Pick an arbitrary vertex , and perform breadthfirst search from to compute a vertex .

Use breadthfirst search again to compute and return the result.
This heuristic always produces a lower bound on a graph’s diameter, and in practice usually achieves a close approximation. What properties of “realworld” graphs might explain this empirical performance?
The BCT model is largely inspired by the metric properties of random graphs. To explain, for a vertex and natural number , let denote the smallest length so that there are at least vertices at distance (exactly) from . Ignoring the specifics of the random graph model, the step neighborhoods (i.e., vertices at distance exactly ) of a vertex in a random graph resemble uniform random sets of size increasing with . We next use this property to derive a heuristic upper bound on . Define and . Since the step neighborhood of and the step neighborhood of act like random sets of size , a birthday paradox argument implies that they intersect with nontrivial probability. If they do intersect, then is an upper bound on . In any event, we can adopt this inequality as a deterministic graph property, which can be tested against real network data.^{4}^{4}4The actual BCT model uses the upper bound for , to ensure intersection with high enough probability.
Property 5.1.
For all , .
One would expect this distance upper bound to be tight for pairs of vertices that are far away from each other, and in a reasonably random graph, this will be true for most of the vertex pairs. This leads us to the next property.^{5}^{5}5We omit the exact definition of this property in the BCT model, which is quite involved.
Property 5.2.
For all : for “most” , .
The third property posits a distribution on the values. Let denote the average .
Property 5.3.
There are constants such that the fraction of vertices satisfying is roughly .
A consequence of this property is that the largest value of is .
As we discuss below, these properties will imply that simple heuristics work well for computing the diameter of a graph. On the other hand, these properties do not generally hold in realworld graphs. The actual BCT model has a nuanced version of these properties, parameterized by vertex degrees. In addition, the BCT model imposes an approximate powerlaw degree distribution, in the spirit of powerlaw bounded graphs (Definition 4.1 in Section 4). This nuanced list of properties can be empirically verified on a large set of realworld graphs.
Nonetheless, for understanding the connection of metric properties to diameter computation, it suffices to look at Properties 5.1–5.3. We can now bound the eccentricities of vertices. The properties imply that
Fix and imagine varying
to estimate
. For “most” vertices , . By Property 5.3, one of the vertices satisfying this lower bound will also satisfy . Combining, we can bound the eccentricity by(4) 
The bound (4) is significant because it reduces maximizing over to maximizing .
Pick an arbitrary vertex and consider a vertex that maximizes . By an argument similar to the one above (and because most vertices are far away from ), we expect that . Thus, a vertex maximizing is almost the same as a vertex maximizing , which by (4) is almost the same as a vertex maximizing . This gives an explanation of why the TwoSweep algorithm performs so well. Its first use of breadthfirst search identifies a vertex that (almost) maximizes . The second pass of breadthfirst search (from ) then computes a close approximation of the diameter.
The analysis in this section is heuristic, but it captures much of the spirit of algorithm analysis in the BCT model. These results for TwoSweep can be extended to other heuristics that choose a set of vertices through a random process to lower bound the diameter. In general, the key insight is that most distances in the BCT model can be closely approximated as a sum of quantities that depend only on either or .
6 Discussion
Let’s take a bird’seye view of this chapter. The big challenge in the line of research described in this chapter is the formulation of graph classes and properties that both reflect realworld graphs and lead to a satisfying theory. It seems unlikely that any one class of graphs will simultaneously capture all the relevant properties of (say) social networks. Accordingly, this chapter described several graph classes that target specific empirically observed graph properties, each with its own algorithmic lessons:

Triadic closure aids the computation of dense subgraphs.

Powerlaw degree distributions aid subgraph counting.

hop neighborhood structure influences the structure of shortest paths.
These lessons suggest that, when defining a graph class to capture “realworld” graphs, it may be important to keep a target algorithmic application in mind.
Different graph classes differ in how closely the definitions are tied to domain knowledge and empirically observed statistics. The closed and triangledense graph classes are in the spirit of classical families of graphs (e.g., planar or boundedtreewidth graphs), and they sacrifice precision in the service of generality, cleaner definitions, and arguably more elegant theory. The PLB and BCT frameworks take the opposite view: the graph properties are quite technical and involve many parameters, and in exchange tightly capture the properties of “realworld” graphs. These additional details can add fidelity to theoretical explanations for the surprising effectiveness of simple heuristics.
A big advantage of combinatorially defined graph classes—a hallmark of graphtheoretic work in theoretical computer science—is the ability to empirically validate them on real data. The standard statistical viewpoint taken in network science has led to dozens of competing generative models, and it is nearly impossible to validate the details of such a model from network data. The deterministic graph classes defined in this chapter give a much more satisfying foundation for algorithmics on realworld graphs.
Complex algorithms for realworld problems can be useful, but practical algorithms for graph analysis are typically based on simple ideas like backtracking or greedy algorithms. An ideal theory would reflect this reality, offering compelling explanations for why relatively simple algorithms have such surprising efficacy in practice.
We conclude this section with some open problems.

Theorem 2.3 gives, for constant , a bound of on the number of maximal cliques in a closed graph. Fox et al. (2020) also prove a sharper bound of , which is asymptotically tight when . Is it tight for all values of ? Additionally, parameterizing by the number of edges () rather than vertices (), is the number of maximal cliques in a closed graph with bounded by ? Could there be a lineartime algorithm for maximal clique enumeration for closed graphs with constant ?

Theorem 3.1 guarantees the capture by a tightlyknit family of an fraction of the triangles of a triangledense graph. What is the bestpossible constant in the exponent? Can the upper bound be improved, perhaps under additional assumptions (e.g., about the distribution of the clustering coefficients of the graph, rather than merely about their average)?

Is there a compelling algorithmic application for graphs that can be approximated by tightlyknit families?

Benson et al. (2016) and Tsourakakis et al. (2017) defined the triangle conductance of a graph, where cuts are measured in terms of the number of triangles cut (rather than the number of edges). Empirical evidence suggests that cuts with low triangle conductance give more meaningful communities (i.e., denser subgraphs) than cuts with low (edge) conductance. Is there a plausible theoretical explanation for this observation?

A more openended goal is to use the theoretical insights described in this chapter to develop new and practical algorithms for fundamental graph problems.
7 Notes
The book by Easley and Kleinberg (2010) is a good introduction to social networks analysis, including discussions of heavytailed degree distributions and triadic closure. A good if somewhat outdated review of generative models for social and information networks is Chakrabarti and Faloutsos (2006). The Enron email network was first studied by Klimt and Yang (2004).
The definitions of closed and weakly closed graphs (Definitions 2.1–2.2) are from Fox et al. (2020), as is the fixedparameter tractability result for the maximum clique problem (Theorem 2.3). Eppstein et al. (2010) proved an analogous result with respect to a different parameter, the degeneracy of the input graph. The reduction from efficiently enumerating maximal cliques to bounding the number of maximal cliques (Theorem 2.1) is from Tsukiyama et al. (1977). MoonMoser graphs and the MoonMoser bound on the maximum number of maximal cliques of a graph are from Moon and Moser (1965).
The definition of triangledense graphs (Definition 3.1) and the inverse theorem for them (Theorem 3.1) are from Gupta et al. (2016). The computation of the triangle density of the Facebook graph is detailed by Ugander et al. (2011).
The definition of power law bounded graphs (Definition 4.1) first appeared in Berry et al. (2015) in the context of triangle counting, but it was formalized and applied to many different problems by Brach et al. (2016), including triangle counting (Theorem 4.2), clique enumeration (Exercise 15), and linear algebraic problems for matrices with a pattern of nonzeroes that induces a PLB graph. Brach et al. (2016) also performed a detailed empirical analysis, validating Definition 4.1 (with small shifts ) on real data. The degeneracyparameterized bound for counting triangles is essentially due to Chiba and Nishizeki (1985).
Acknowledgments
The authors thank Michele Borassi, Shweta Jain, Piotr Sankowski, and Inbal TalgamCohen for their comments on earlier drafts of this chapter.
References
 Benson et al. (2016) Benson, A., D. F. Gleich, and J. Leskovec (2016). Higherorder organization of complex networks. Science 353(6295), 163–166.
 Berry et al. (2015) Berry, J. W., L. A. Fostvedt, D. J. Nordman, C. A. Phillips, C. Seshadhri, and A. G. Wilson (2015). Why do simple algorithms for triangle enumeration work in the real world? Internet Mathematics 11(6), 555–571.
 Borassi et al. (2017) Borassi, M., P. Crescenzi, and L. Trevisan (2017). An axiomatic and an averagecase analysis of algorithms and heuristics for metric properties of graphs. In Proceedings of the TwentyEighth Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 920–939.
 Brach et al. (2016) Brach, P., M. Cygan, J. Lacki, and P. Sankowski (2016). Algorithmic complexity of power law networks. In Proceedings of the TwentySeventh Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 1306–1325.
 Chakrabarti and Faloutsos (2006) Chakrabarti, D. and C. Faloutsos (2006). Graph mining: Laws, generators, and algorithms. ACM Computing Surveys 38(1).
 Chiba and Nishizeki (1985) Chiba, N. and T. Nishizeki (1985). Arboricity and subgraph listing algorithms. SIAM Journal on Computing 14(1), 210–223.
 Easley and Kleinberg (2010) Easley, D. and J. Kleinberg (2010). Networks, Crowds, and Markets. Cambridge University Press.
 Eppstein et al. (2010) Eppstein, D., M. Löffler, and D. Strash (2010). Listing all maximal cliques in sparse graphs in nearoptimal time. In Proceedings of the 21st International Symposium on Algorithms and Computation (ISAAC), pp. 403–414.
 Fox et al. (2020) Fox, J., T. Roughgarden, C. Seshadhri, F. Wei, and N. Wein (2020). Finding cliques in social networks: A new distributionfree model. SIAM Journal on Computing 49(2), 448–464.
 Gupta et al. (2016) Gupta, R., T. Roughgarden, and C. Seshadhri (2016). Decompositions of triangledense graphs. SIAM Journal on Computing 45(2), 197–215.

Klimt and Yang (2004)
Klimt, B. and Y. Yang (2004).
The enron corpus: A new dataset for email classification research.
In
Proceedings of the 15th European Conference on Machine Learning (ECML)
, pp. 217–226.  Moon and Moser (1965) Moon, J. and L. Moser (1965). On cliques in graphs. Israel Journal of Mathematics 3, 23–28.
 Roughgarden (2020) Roughgarden, T. (Ed.) (2020). Beyond the WorstCase Analysis of Algorithms. Cambridge University Press.
 Tsourakakis et al. (2017) Tsourakakis, C. E., J. W. Pachocki, and M. Mitzenmacher (2017). Scalable motifaware graph clustering. In Proceedings of the Web Conference (WWW), Volume abs/1606.06235, pp. 1451–1460.
 Tsukiyama et al. (1977) Tsukiyama, S., M. Ide, H. Ariyoshi, and I. Shirakawa (1977). A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing 6(3), 505––517.
 Ugander et al. (2013) Ugander, J., L. Backstrom, and J. Kleinberg (2013). Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In Proceedings of World Wide Web Conference, pp. 1307–1318.
 Ugander et al. (2011) Ugander, J., B. Karrer, L. Backstrom, and C. Marlow (2011). The anatomy of the facebook social graph. arXiv:1111.4503.
Exercises

Prove that a graph is weakly closed in the sense of Definition 2.2 if and only if its vertices can be ordered such that, for every , the vertex is good in the subgraph induced by .

Prove that the backtracking algorithm in Section 2.3 enumerates all of the maximal cliques of a graph.

Prove that a graph has triangle density if and only if it is a disjoint union of cliques.

Let be the complete regular tripartite graph with vertices—three vertex sets of size each, with each vertex connected to every vertex of the other two groups and none of the vertices within the same group.

What is the triangle density of the graph?

What is the output of the cleaner (Section 3.4) when applied to this graph? What is then the output of the extractor?

Prove that admits no tightlyknit family that contains a constant fraction (as ) of the graph’s triangles and uses only radius1 clusters.

Prove Claim 4.2.
[Hint: To prove (a), break up the sum over degrees into subsums between powers of . Apply Definition 4.1 to each subsum.]

Implement Algorithm 2 from Section 4.3 in time, where is the number of outneighbors of in the directed version of , assuming that the input is represented using only adjacency lists.
[Hint: you may need to store the in and outneighbor lists of .]

Prove that every graph with edges has degeneracy at most . Exhibit a family of graphs showing that this bound is tight (up to lower order terms).

Suppose is a PLB graph with exponent .

Prove that the maximum degree of is .

Prove that the degeneracy is .

Prove that Algorithm 2 in Section 4.3 runs in time and time in vertex PLB graphs with exponents and , respectively.

Prove that all of the cliques of a graph with degeneracy can be enumerated in time. (By Exercise 11(b), this immediately gives a subexponentialtime algorithm for enumerating the cliques of a PLB graph.)
Comments
There are no comments yet.