1.1 The Study of Large-Scale Graphs
In the last few decades, there has been a significant interest in the study of large-scale graphs, which arise from modelling social networks, web graphs, biological networks, etc. Many of these graphs have millions or even billions of vertices or edges, so understanding these graphs is a challenging task, as they make conventional algorithms inefficient or impractical.
The emergence of large-scale graphs poses several fundamental questions and urges us to revisit important concepts and graph algorithms. Examples of computational goals on large-scale graphs include calculating the number of vertices, the number of connected components, the distance between two given vertices, the sparseness or denseness, the degree distribution, the central vertices, and other global or local properties. There is a large body of work in mathematics and computer science that studies these and related questions.
In order to answer the questions of the types above, researchers started to devise graph preprocessing methods and graph simplifications, which transform an input graph into a smaller graph or a concise data structure, such that computations on the input graph can be transformed into more efficient computations on the simplified structure. Different computational goals call for different graph preprocessing methods; for example, the efficiency of exact distance queries can be increased by adding so-called 2-hop labels to the vertices [cohen-03], while the computation of a minimum cut can be improved by removing certain edges and resulting in a cut sparsifier [benczur-15].
Within the field of graph preprocessing, a prominent class of methods involve partitioning the vertices to aid visualisation or computation. These methods come under the names of community detection [fortunato-10], graph partitioning [bichot-13] and graph clustering [kannan-04]. Preprocessing of graphs can also be viewed from the angle of data compression. Graph compression, just as clustering, is also based on grouping vertices, while storing additional information in order to recreate the original graph [besta-18-survey, besta-19-slim, navlakha-08, shin-19-sweg, toivonen-11].
Among the various types computations on graphs, the most fundamental is the distance query, which seeks the length of a shortest path between two vertices. The distance query is important because it is the basis of many other query types in fields such as transportation planning [bast-16], network design [miller-13], operational research [goldman-71, hakimi-64, slater-82], and graph databases [graph-databases]. When the graph is large, direct computation of the distance becomes impractical, so there is a need for preprocessing methods that can efficiently approximate the distance while still being reasonably accurate.
In this paper, when a graph is preprocessed into another graph , we use to denote the distance between two vertices and in , and we use to denote their distance in .
There are numerous preprocessing methods to handle approximate distance queries, and one of them is constructing spanners [elkin-04, peleg-89]. On a connected graph , a spanner is a spanning subgraph with integer parameters and , such that for all vertices and ,
In the case of spanners, one deletes edges to obtain a spanning subgraph. Instead, one can also simplify a graph using edge-contraction. When is transformed into by contracting edges, there is a natural mapping from the vertices of to those of . Given real-valued constants and , the authors of [bernstein-19] studied the optimisation problem of finding a minimal set of edges to contract, such that Inequation (1) is satisfied for all vertices and in .
In this paper, we continue on the lines of research on approximate distance-preservation, and introduce quasi-isometries to the active field of graph simplifications. The goal is to provide a general and formal mathematical framework aimed at understanding large-scale graphs and answering the types of questions listed at the start of this section.
We base our framework on the notion of large-scale geometry introduced by Gromov [gromov-81, gromov-96]. The concept of large-scale geometry turned out to be crucial in the study of growth rates of finitely generated infinite algebraic objects such as groups and their Cayley graphs. Later, quasi-isometries were applied to infinite trees [kroen-08] as well as infinite strings [khou-17].
[Gromov [gromov-81]] Let and be metric spaces, and let be non-negative integers with . Then a function is called an -quasi-isometry if the following two properties are satisfied:
The first property is called the quasi-isometric inequality, and the second property is called the density property. The constants are called the quasi-isometry constants (of the function ). The constant is called the stretch factor, and is the additive distortion.
Thus, quasi-isometries can be viewed as bi-Lipschitz maps with a finite additive distortion. We have the following simple observations: (a) -quasi-isometries are isometries; (b) -quasi-isometries are surjective; (c) -quasi-isometries are injective.
According to Gromov, the large-scale geometry of all finite objects (such as graphs) are trivial, as they are all equivalent to the singleton graph. In this paper we refine Gromov’s idea of large-scale geometry in the setting of large finite graphs.
Quasi-isometries form an equivalence relation, with reflexivity and transitivity obvious. As for symmetry, consider an -quasi-isometry , and define as follows. For all , if is in the image of , then select an from and set . Otherwise, select an such that and set . Then this function is a quasi-isometry from into . All finite metric spaces are quasi-isometric to each other, and form one quasi-isometry equivalence class; namely, every finite metric space with elements is -quasi-isometric to the singleton space.
1.3 Basic Questions and the Problem Set-Up
Every connected graph forms a metric space, so quasi-isometries are applicable. Nevertheless, every finite graph is quasi-isometric to the singleton graph, so we refine the concept of quasi-isometry for finite graphs by postulating that the quasi-isometry constants are small. Given a large graph , one of our aims is to find quasi-isometries of into smaller but non-singleton graphs , such that the quasi-isometries have small constants, and at the same time the graphs retain important properties of the original graph .
Thus, the general framework proposed by this work is to find properties of graphs that are preserved under quasi-isometries with small constants. More formally, let be an abstract property of graphs. This property can be a predicate on the vertices, such as ‘being a central vertex’, ‘belonging to a non-trivial clique’, or ‘having the maximum degree’. The property can also be a global property of the graph, such as ‘being a tree’ or ‘being chordal’. We want to investigate whether is preserved under quasi-isometries with small constants, and below we describe our problem set-up.
Let be a class of graphs. Given and a property , we want to build another graph and a quasi-isometry such that the following are satisfied:
Small quasi-isometry constants: One possibility is to condition constants , and to be smaller than . This idea allows us to control the distortion between the distance between any two vertices in and their distance in . This also allows us to avoid collapsing into a singleton. Note that Inequality (1) is covered by this condition.
Compression: for some real number , where is called the compression constant. This property ensures that is a meaningful size-reducing simplification.
Preservation: The property should be well-behaved with respect to . This is free to a reasonable interpretation. For instance, one can demand that for all , if satisfies , then the vertex should also satisfy .
Efficiency: Building and should be efficient on the size of . From an algorithmic view point, this property is natural and is directly linked with issues related to the preprocessing of large-scale graphs.
Retention: should be in the same class as . In other words, the quasi-isometry should retain the key algebraic properties of .
This is a semi-formal set-up, and with these goals in mind, we provide our initial findings in this line of research as outlined in the following Section 2.
2 Our Contributions
Below we list our contributions to the area devoted to understanding large-scale graphs, connecting to the list in Section 1.3.
We propose a general formal framework to study large-scale graphs based on quasi-isometries. We provide several simple algorithms and methods that quasi-isometrically map large graphs to smaller graphs.
In order to build quasi-isometries of graphs, we introduce the notion of partition-graphs in Section 4. These are simplified graphs built from any given graph by grouping vertices. We show that partition-graphs are quasi-isometric to the original graph, with the quasi-isometry constants depending on the diameters of the super-vertices, and the compression property depending on the cardinalities of the super-vertices.
We investigate the question if the vertices in the centre of a given graph are preserved under quasi-isometries. Among the countless different notions of graph centrality, we focus on the two most basic: the centre and the median, both of which are defined in terms of the distance. In order to capture the effect of graph simplifications on the centre, Section 5 introduces the concept called the centre-shift. Given a quasi-isometric graph simplification , with and being the respective centres of and , the centre-shift measures the distance in between and .
It turns out that a quasi-isometry alone is not strong enough to bound the centre-shift. As shown by Theorem 5, the centre-shift of a mapping is bounded above by a function involving the radius of , given satisfies a special property.
We then turn our focus to trees. Trees already provide interesting counter-examples (Figure 1), which suggests that even on trees, quasi-isometric simplifications needs to be constructed with care, depending on the query. Furthermore, although trees are simple objects, they are nevertheless used in many fields such as mathematical phylogenetics [semple-03] and optimisation [wu-chao-04]. Section 6 shows that the method of outward-contraction produces partition-trees with centre-shift zero, which means that outward-contraction preserves the centres of trees.
Finally, Section 7 shows that to preserve the median of trees, we need to store extra numerical information. Without this numerical information, there are cases where outward-contraction does not preserve the medians of trees. However, if we store the cardinality of each vertex-subset in the partition, and handle the graph as a vertex-weighted graph, then we can locate the median of the original tree from the partition.
Thus, in terms of our problem set-up in Section 1.3, our contributions (1)–(2) address issues related to small quasi-isometry constants and compression, while (3)–(6) focus on the preservation of the centre (as the property ) under specific quasi-isometric simplification of trees.
In this paper, all graphs are assumed to be undirected, finite, and without loops or parallel edges. In formal terms, the graphs in this paper are described as follows. A graph is a pair , where is a finite set of vertices, and is a set of edges (which are 2-element subsets of ). Two vertices and are adjacent, denoted , when they are joined by an edge; that is, . We sometimes write to mean the vertex-set when no ambiguity arises, and denotes the number of vertices in .
In a graph, a path is a sequence of vertices such that or for all . A simple path is a sequence of distinct vertices such that for all . For every path , there exists a simple path with the same endpoints. The length of a (simple) path is the number of edges. Two vertices are connected when they are endpoints of some path. A graph is connected when every pair of vertices is connected. In this paper, all graphs are assumed to be connected.
The path-graph on vertices, denoted , is the graph on vertices such that for all .
The distance between two vertices and , denoted , is the length of a shortest path between and . For two vertex-sets , the distance is defined to be , while the distance between a vertex and a vertex-set is .
The eccentricity of a vertex is the maximum distance from to any other vertex: . The eccentricity-witnesses (ecc-wits) of a vertex are the vertices such that .
The triangle inequality of the distance function in graphs allows us to establish the following proposition:
For two vertices and in a graph, .∎
The centre of a graph , denoted , is the set of vertices with the minimum eccentricity.
It is well-known that the centre of a tree consists of a single vertex or two adjacent vertices. Also, the centre of a tree can be located by an algorithm called leaf-removal [goldman-71]. At the start, leaf-removal removes all the leaves of the input tree , and results in a smaller tree . Next, leaf-removal removes all the leaves of and results in . This process repeats until only a single vertex or two adjacent vertices remain. Then the final remaining vertices are the centre of .
We also need some more distance-related notions. The radius of is the minimum eccentricity: . The diameter of is the maximum eccentricity: . A diameter-path is a path whose length equals the diameter. The distance-sum of a vertex is defined to be .
The median of a graph is the set of vertices with the minimum distance-sum.
3.1 Examples of Quasi-Isometries through Independent Sets
With quasi-isometries defined in Definition 1.2, here we present natural examples of quasi-isometries on finite connected graphs.
Let be a graph. Let be a maximal independent set of . Namely, is a maximal subset of such that no two vertices in are adjacent in . On the set we define the following edge set . Now, the mapping from into is defined as follows. If , then is simply defined as . Otherwise, is defined to be any neighbour of .
This mapping as defined above is a (2,1,1)-quasi-isometry from to . ∎
Since quasi-isometries are transitive (Section 1.2), Lemma 3.1 implies that for all maximal independent sets , all the graphs are quasi-isometric via small quasi-isometry constants. In other words, all these graphs form a quasi-isometry class witnessed by small quasi-isometry constants:
Let be a graph. Then for all maximal independent sets and of , there is a (4,2,4)-quasi-isometry mapping from to . ∎
In this section we provide a simple method of building quasi-isometries. Given a graph we aim to build a smaller graph such that there is a quasi-isometry from onto with small quasi-isometry constants. We start with the following definition that formalises the idea of grouping vertices.
A partition or a vertex-grouping of a graph is a partition of into connected subsets. Furthermore, the subsets are called super-vertices.
Note that the word ‘partition’ here is slightly different from the set-theoretic use of the same word, as we additionally require the super-vertices to be connected.
Given a partition on , the partition-graph is defined as follows.
The vertices of are the connected subsets in the partition .
Two super-vertices and are adjacent via a super-edge in if and only if there exist and such that in .
Sometimes we write instead of , if is clear from the context. Also, for any vertex , we use to denote the super-vertex in that contains . Then we have a natural mapping with .
As is a many-to-one mapping, for any path in , its corresponding path in a partition-graph is always no longer than the original path because of possible repeats. Consequently, if contains a cycle of length , then contains a cycle of length . Therefore, we have Proposition 4, which shows that partition-graphs satisfy the retention property in Section 1.3 at least for trees.
Every partition-graph of a tree is always a tree.∎
It turns out that an extra condition on the partition is required in order for the mapping to be a quasi-isometry, and this condition is an upper bound on the diameter of each super-vertex, which the following definition describes formally.
Given a natural number , a partition is called a -sharp partition when every super-vertex satisfies .
An upper bound on the diameter of every super-vertex is akin to chopping the original graph into bits that are small and ‘sharp’. Next, sharp partitions lead us to the following proposition related to quasi-isometry.
If is a -sharp partition on , then the natural mapping from to is a -quasi-isometry.
Take any two vertices and in , and consider their corresponding super-vertices and . Firstly, vertex-grouping always reduces the distance, so . Next, given , we seek the biggest possible value of . There are at most edges between and , so , which further leads to . Overall,
which is the quasi-isometry inequality with being the first constant and 1 being the second constant. Finally, to obtain the third constant, for every in , simply take its representative vertex in , and then .∎
The sharpness of a partition ensures small quasi-isometry constants, the first goal that we stated in Section 1.3. However, when the sharpness is 1, the partition-graph is exactly identical to the original graph, and does not achieve any meaningful simplification. In order to achieve a meaningful amount of compression, the second goal in Section 1.3, we need the concept of a partition’s coarseness in addition to the sharpness.
Given a natural number , a partition of a graph is called a -coarse partition when every super-vertex satisfies .
A lower bound on the diameter of every super-vertex implies that every super-vertex contains at least vertices, and it directly follows that the size of the partition-graph is times smaller than the size of the original graph , as stated by the following proposition.
If is a -coarse partition on , then . ∎
In conclusion, a small sharpness value ensures small quasi-isometry constants, and implies higher distance-precision. On the other hand, a large coarseness value ensures sufficient compression. These respectively correspond to the first two goals listed in Section 1.3, so a good partition must achieve a balance between these two antithetical parameters.
On any input graph, choose an unassigned vertex uniformly at random, assign and its unassigned neighbours to a new super-vertex, and repeat until no unassigned vertex remains. Since the diameter of every collapsed neighbourhood is at most two, the resulting partition is 2-sharp. However, this can produce super-vertices that contain only one vertex, and potentially result in a 1-coarse partition, which does not satisfy the compression property listed in Section 1.3.
To remedy this, we can make a modification. Define an unassigned vertex to be completely free when all of its neighbours are unassigned. Then the modified method runs as follows:
While there exists some completely free vertex, choose a completely free vertex uniformly at random, and assign and its unassigned neighbours to a new super-vertex.
Then we reach a stage where all of the remaining unassigned vertices are not completely free. For each unassigned vertex , it must have an assigned neighbour. Hence, we choose any assigned neighbour , and place into the super-vertex containing .
The resulting partition is 4-sharp and 2-coarse. This means that the compression property is satisfied, while the quasi-isometry constants are still small. Therefore, this modified method achieves the goal better overall.
Let be a graph, and consider , the centre of . Our goal is to understand if is preserved under quasi-isometric simplifications. In a practical setting, when is a graph whose centre is impractical to compute, one might want to simplify to a smaller graph with a mapping , locate the centre of (denoted ), and then infer the centre given and . The first seemingly natural way to infer is to use the reverse image: .
It is clear that this set of vertices does not necessarily equal the original centre . Therefore we need some form of a metric to measure how far apart and are. Hence, we introduce the concept of the centre-shift to quantify the distance in between the subsets and . Since is the simplified and coarser graph, defining the centre-shift in terms of the distance in appears more accurate and reasonable.
The centre-shift of is defined to be .
Before investigating any possible relationship between quasi-isometry and the centre-shift, we first present Lemma 5, which shows that the same quasi-isometric inequality applies not only to the distance but also to the eccentricity. Its proof is in Appendix A.1,
Let be an -quasi-isometry. Then for all we have:
The next two theorems derive upper bounds on the centre-shift, using the quasi-isometry inequality and an extra condition as the only constraints. We first define this extra condition.
Let be a graph with centre . Then is said to have uniform eccentricity (uni-ecc) when for all vertices in we have .
Earlier, Proposition 3 observed that the difference between the eccentricities of any two adjacent vertices is no more than one. Placed in the context of Definition 5, the essence of Proposition 3 can also be expressed as for all vertices in . Therefore, the uni-ecc property is in fact a strong property, and one can construct graphs that do not satisfy it.
Let be a graph that satisfies the uni-ecc property and has centre , and let be an -quasi-isometry. Then the centre-shift is bounded above by
Vertex-grouping always decreases the distance, so its property is more specific than a quasi-isometry. If simplifies to by grouping vertices, it is always the case that for all . Note that this is not the case with edge-removing simplifications such as spanners, where the distance-function increases. Such a ‘one-sided quasi-isometry’ yields a slightly more specific bound. The proof Theorem 5 is the same as the proof of Theorem 5.
Let be a graph that satisfies the uni-ecc property and has centre , and let be a mapping that satisfies . Then the centre-shift is bounded above by .∎
It is routine to check that trees satisfy the uni-ecc property, so Theorems 5 and 5 already ensure that any quasi-isometric simplification of a tree has a bounded centre-shift. However, as this upper bound is a function of the radius of the tree, we want to further investigate if the centre-shift can be bounded by a constant under particular quasi-isometric simplifications of trees.
On the path-graph222Defined in Section 3. with , define the partition as follows. Starting from , we set the size of the super-vertex
to be two or three with equal probability. That is,is set to be with probability 1/2, or with probability 1/2.
In general, let be the smallest index such that is unassigned. Provided that , we set to be or with equal probability, and then we choose the next available . The end of has special cases: if , then is set to be ; if , then is set to be .
Let and be the numbers size-two and size-three super-vertices, respectively. Then , where is present in case the last super-vertex has size one. Hence, the size of the simplified path-graph is . Since the process is random, the expected value of the size of tends to when
is large. This is because in a long random sequence, the number of size-two super-vertices tends to equal the number of size-three super-vertices. Since the sizes of the super-vertices are uniformly distributed, the probability ofhaving zero centre-shift tends to one.
6 Outward-Contraction and the Centres in Trees
This section studies partition-graphs on trees (called partition-trees for short), and presents a method called outward-contraction, which is a specific procedure of generating a partition on any input tree. We then show that outward-contraction always produces partition-trees with centre-shift zero.
The centre-shift of any partition-tree is bounded by an expression that depends on its radius. However, when the partition-tree has a large radius, the centre-shift as a numerical value can still become arbitrarily large.
Figure 1 shows a pattern of partition-trees with arbitrarily large centre-shift. In each of the two trees, the vertices in the centre of the original tree are marked by solid circles, while the super-vertex in the partition-tree are marked by . From left to right in Figure 1, the centre-shifts are two and three, respectively. Following this pattern, one can construct bigger trees with the partitions such that the centre-shift as a value is arbitrarily large. Nevertheless, the radius of the tree increases as the centre-shift increases, so Theorem 5 is still satisfied.
Although not every partition-tree has a small centre-shift, we now present a method called outward-contraction, which produces specific partition-trees with zero centre-shift.
Let be a tree with a designated vertex . Then for every vertex , the level of is defined to be . The designated is on level zero.
For a path in , a turning-point of the path is a vertex (with ), such that the level of is smaller than the levels of and .
The method of outward-contraction takes a tree as input, designates an arbitrary vertex , and partitions the vertex-set as follows. For every vertex on an even level, outward-contraction groups with its neighbours that have a larger level. Outward-contraction then produces the partition-graph based on these super-vertices.
It follows directly from this definition that outward-contraction always produces a 2-sharp partition. Figure 1(b) shows an example of outward-contraction, where the designated vertex is marked by a square.
The centre of a tree always lies on a diameter-path, and hence is the same as the centre of any diameter-path of the tree [wu-chao-04]. This allows us to reduce the problem of finding the centre of a tree into a simpler problem on a path.
On the path-graph , a partition can be expressed as a sequence of natural numbers that represent the sizes of the super-vertices from left to right. The numbers in such a sequence sum to , so sequences like these are simply integer compositions. Before stating and proving the main result (Theorem 6), we formally introduce some concepts that help compute the centre-shift of a partition-path when it is represented by an integer composition.
Given , an integer composition is a sequence of natural numbers that sum to . An integer composition of length can itself be viewed as a path-graph itself, and we call a partition-path of . Being a tree, has one or two centre-vertices. Correspondingly, we can think of the integer composition as having a centre.
Let be an integer composition of length . The set of centre-indices is when
is odd, andwhen is even.
The centre-sum is , for in the set of centre-indices.
The left-sum is , for smaller than all the centre-indices.
The right-sum is , for larger than all the centre-indices.
Consider 332231, which represents a partition on . (In the rest of the paper, we write integer compositions in typewriter font to aid clarity.) The centre-indices of are 3 and 4, so the centre-sum . Its left-sum is , its right-sum is , and . Now, since , we can straightaway conclude that the centre-shift is zero.
In general we have the following result with regards to the centre-shift. Its proof is in Appendix A.2.
Let be a partition-path of , and let be the integer composition that represents . Also, let , and respectively denote the centre-sum, left-sum and right-sum of . Then the centre-shift is 0 if , or otherwise.
Outward-contraction always produces a partition-tree with centre-shift zero.
Let be the input tree. Outward-contraction arbitrarily designates a vertex, and generates a partition on .
Consider a diameter-path in . Let be the restriction of to . That is, . In the rest of the proof, we focus only on the super-vertices in . Since is a path, we refer to the sizes of its super-vertices as elements of an integer composition.
If a path in a tree has two turning-points, then we can easily construct a cycle and cause a contradiction. Therefore, the path has at most one turning-point. If does have a turning-point, then the super-vertex containing the turning-point has size either one or three. In the integer composition that represents , such a super-vertex is represented by a 1 or a 3. On the other hand, the endpoints of are contained in super-vertices with size one or two, and hence a 1 or a 2 in the integer composition. Meanwhile, all the remaining super-vertices that contain neither the turning-point nor endpoints always have size two. We now consider leaf-removal (Note 3) on , which leads to two possible cases.
[Case 1] Suppose leaf-removal does not encounter the turning-point throughout the execution. This occurs when has no turning-point, or when the super-vertex containing the turning-point is in the centre of (the partition-path of induced by ). The centre of contains either one or two super-vertices, and at most one of these super-vertices contains the turning-point.
If the centre of contains only one super-vertex, then either this super-vertex contains the turning-point and has size one or three, or it does not contain a turning point and has size two. Overall, the centre-sum of is one, two or three.
If the centre of contains two super-vertices, then these super-vertices correspond to the following possible integer compositions: 12, 21, 32, 23 or 22. The first four occur when one of these super-vertices in the centre contains the turning-point, while the last composition 22 occurs when neither super-vertex in the centre contains the turning-point.
Now, the possible centre-sum ranges from one to five. Then there are four further subcases depending on whether each endpoint of is a 1 or a 2. These subcases are listed in Table 1 alongside their respective values. The centre-super-vertices are marked by , and the dots all stand for 2.
As in all possible cases, by Theorem 6 the centre-shift is always zero.
[Case 2] Suppose leaf-removal encounters the turning-point of at some point during the execution. Then the turning-point is not in any super-vertex of the centre of , so the possible values of the centre-sum are two (one super-vertex in the centre) and four (two super-vertices in the centre).
Without loss of generality, assume that the super-vertex containing the turning-point is on the left-hand side of the centre of . Depending on whether each endpoint is a 1 or a 2, as well as whether the super-vertex containing the turning-point is a 1 or a 3, there are eight subcases listed alongside the corresponding values in Table 2. Again, the super-vertices in the centre of are marked by , and the dots all stand for 2.
As in all cases, by Lemma 6, the centre-shift is always zero. ∎
7 Vertex-Weighted Partition-Trees and Medians
Although outward-contraction preserves the centre of a tree, it does not always preserve the median, as shown by the example in Figure 1(b).
Nevertheless, partition-trees can still preserve the median by taking the sizes of the super-vertices into account. This brings us to define vertex-weighted graphs and the vertex-weighted distance-sum, which were also used in [hakimi-64].
A vertex-weighted graph is a graph with a vertex-weight function . In addition, the weight of a vertex-subset , written as , is defined to be the sum of the weights of all .
In a vertex-weighted graph , the vertex-weighted distance-sum of each vertex is defined to be . Then the median of a vertex-weighted graph is the set of vertices that minimise the distance-sum function.
Let a vertex-weighted tree with as the vertex-weight function, and let and be adjacent vertices in . Furthermore, let denote the set of vertices that pass through in order to reach ; the set is defined symmetrically. Then, .
Let a vertex-weighted tree with as the vertex-weight function, and let be defined in the same way as in Lemma 7. Then the following statements hold.
If , then for all .
If , then is the median.
The median of a vertex-weighted tree consists of either one vertex or two adjacent vertices.
With these basics of vertex-weighted graphs in place, we move on to define how vertex-weights are incorporated into the framework of partition-graphs.
Given a partition on a graph , the vertex-weighted partition-graph is defined as follows.
The vertices and edges of are the same as in Definition 4.
The weight of each vertex in is the cardinality of its corresponding subset of .
Now we can state and prove the main theorem of this section. On the notation, the super-vertices in are denoted using capital letters, and the distance-sum of a super-vertex in is denoted by . Since there is little chance of ambiguity, we overload the notation for convenience.
Let be a tree, and let denote the vertex-weighted partition-graph induced by any partition on . Then every super-vertex in the median of contains a vertex in the median of .
Since the vertex-weighted is still a tree, its median is either a single super-vertex of two adjacent super-vertices (Corollary 7), so we have two cases.
[Case 1] Let be the only super-vertex in the median of . Then by definition, for every neighbour of , . Let denote the set of super-vertices that have to pass through in order to reach , and let be the analogous counterpart. Then by Lemma 7, .
Let and such that and are adjacent in . Define to be the set of vertices whose paths to pass through , and define analogously. Now observe that and . This means that and hence . By Corollary 7(1), every vertex has a bigger distance-sum than . Since every vertex not in does not have the minimum distance-sum, so the median-vertices of must be inside .
[Case 2] Let and be the two adjacent super-vertices in the median of . Let and be the corresponding adjacent vertices in . In addition, define , , and as before. Now implies . This further means that and hence . Finally, using Corollary 7(2), and are the two median-vertices of . ∎
We presented methods of graph simplifications that address the goals outlined in Section 1.3. With suitable values of sharpness and coarseness, Section 4 showed that partition-graphs satisfy the first two goals of small quasi-isometry constants and compression. We then focused on trees, where partition-graphs satisfy the retention property (Proposition 4). As for the preservation property, Sections 6 and 7 presented methods to simplify trees while preserving the centre and the median, respectively. As future work, one could develop quasi-isometric graph simplifications for more general graph classes such as -trees and chordal graphs. One could also explore the possibility of employing the theory of random graphs in the study of partition-graphs.
Appendix A Proofs of Lemmas and Corollaries
a.1 Proof of Lemma 5
First, let be an ecc-wit of . Then as ,
On the other hand, let such that is an ecc-wit of . Then
Combining these two inequalities completes the proof.∎
a.2 Proof of Lemma 6
Let be a path-graph with centre , and let be a partition-path of with centre .
We picture as Figure 2(a). Suppose corresponds to Segment C on , and has vertices. Then, let Segments L and R be the two shorter paths after removing Segment C from . Suppose Segments and R respectively contain and vertices, and assume without loss of generality.
Now we use the leaf-removal algorithm (Note 3) to locate the centre of , and then calculate its distance to Segment C.
On a path, one iteration of leaf-removal is the same as removing both endpoints. Hence, we first carry out iterations, which lead us to Figure 2(b). The centre of this shorter path is exactly the same as the centre of .
From (b) there are three possible cases.
[Case 1] When Segments C and R-L have equal length, the centre of is made up of the right-most vertex in Segment C and the left-most vertex in Segment R-L, so the centre-shift is zero.
[Case 2] When Segment C is longer than R-L, the centre of lies in Segment C, so the centre-shift is zero.
These two cases above combine to prove the first part of the lemma: the centre-shift is 0 when . In contrast, the final case involves more effort to quantify the non-zero centre-shift.
[Case 3] When Segment R-L is longer than C, the centre of falls in Segment R-L, and the non-zero centre-shift is the distance between and Segment C. This distance is the same as the distance between the right-most vertex in Segment C and the left-most vertex of .
The right-most vertex in Segment C has index . On the other hand, on a path of length , the index of the left-most centre-vertex is , so the left-most vertex of has index . Finally, we can derive the centre-shift by making the following subtraction:
and this proves the second part of the lemma.∎
a.3 Proof of Lemma 7
We begin by deriving :
Then the latter term can be re-arranged:
The exact same argument also yields:
Therefore, after subtracting these two equations and rearranging, we obtain the lemma’s statement.∎
a.4 Proof of Corollary 7
For every , let be the neighbour of that lies on the path between and .
Define to be the set of vertices whose paths to pass through , and similarly for . Then and . Since all the vertex-weights are positive, these containments imply and .
Due to the premise , Lemma 7 implies that . Hence , so we have . Finally, using routine induction on , we can extend the observation above to the entire , and conclude that for all . ∎
By Lemma 7(1), is equivalent to . This means that , where is the entire tree. Without loss of generality, consider a vertex such that . With respect to the edge , let be the subtree on the side of , and the subtree on the side of .
Now, as , we have , and therefore . Finally, apply Corollary 7(2) to every such in both and , we conclude that and are indeed the minimum. Therefore is the vertex-weighted median. ∎
a.5 Proof of Corollary 7
Firstly, it is easy to construct examples of vertex-weighted trees with medians being a single vertex or two adjacent vertices. Secondly, it suffices to show that in a tree with vertex-weight function , any two vertices in the vertex-weighted median are adjacent. This not only implies that the vertex-weighted median is connected, but also excludes the possibility of the median having three or more vertices.
Let and be vertices with the minimum value, and suppose they are separated by a path . This is shown in Figure 3, where indicate the subtrees of the vertices .
Consider the adjacent vertices and . Since has the minimum value, . Then by Lemma 7,
Similarly, consider the adjacent vertices and . Since has the minimum value, , and hence
Summing Inequalities (2) and (3) leads to . But by Definition 7, the weights of vertices are all positive, so this is a contradiction. Therefore, two vertices with the minimum value must be adjacent, and hence the corollary holds.∎