Understanding the structure of very large graphs like social networks or the webgraph is a challenging task. Given the size of these networks, it is often hopeless to compute structural information exactly. A feasible approach is to design random sampling algorithms that only inspect a small portion of the graph and derive conclusions about the structure of the whole graph from this random sample. However, there are different ways to sample from graphs (random induced subgraphs, random sets of edges, random walks, random BFS, etc.) and also many structural graph properties. This raises the question, which sampling approaches (if any) are suitable to detect or approximate which structural properties.
Graph property testing provides a formal algorithmic framework that allows us to study the above setting from a complexity theory point of view. In this framework, given oracle access to an input graph, our goal is to distinguish between the case that the graph satisfies some property or that it is “far from” having the property by randomly sampling from the graph. Here, a graph property denotes a set of graphs that is invariant under graph isomorphism. Both oracle access and the notion “far from” depend on the representation of the graph. Several models have been proposed in the past two decades for dealing with different types of graphs (see the recent book [Gol17]).
For dense graphs, Goldreich et al. [GGR98:testing] introduced the adjacency matrix model, in which the algorithm can perform any vertex-pair query to the oracle. That is, upon an input vertex pair , the oracle returns if there is an edge between and otherwise. A graph is called -far from having a property if one has to modify more that vertices to make it satisfy for any small constant . Since the time when the model was introduced, many properties were found to be testable in the sense that there exists an algorithm, called tester, that can distinguish if a graph satisfies or is -far from having while only making a constant number of queries. The research in this model has culminated in the seminal work by Alon et al. [AFNS09:characterization], who gave a full characterization of constant-query testable properties by the regularity lemma.
Our understanding of property testing for sparse graphs (e.g., bounded degree graphs) is much more limited. Goldreich and Ron [GR02:testing] initiated the study of property testing for bounded degree graphs in the adjacency list model. A graph is called a -bounded graph if its maximum degree is at most , which is assumed to be a constant. The property tester for a -bounded graph is given oracle access to the adjacency list of the graph, that is, upon an input such that , the oracle returns the -th neighbor of if such a neighbor exists, and a special symbol otherwise. A -bounded graph is said to be -far from having the property if one needs to modify more than edges to obtain a graph that satisfies . In this model, there exist several properties that are known to be testable with a constant number of queries (see discussion below). There also exist a number of properties that require or queries, including bipartiteness [GR99:bipartiteness], expansion [GR00:expansion, CS10:expansion, NS10:expansion, KS11:expansion], -clusterability [CPS15:cluster] and one-sided error minor-freeness [CzuFin14, FicSub18, KSS18:minor]. For the property of being -colorable there is a known lower bound on the number of queries needed to test the property [BOT02:color].
One of the most important questions in this area is to give a purely combinatorial characterization of which graph properties are testable with a constant number of queries. Goldreich and Ron were the first to show that a number of fundamental graph properties including connectivity, -edge connectivity, subgraph-freeness, cycle-freeness, Eulerian and degree regularity can be tested with constant queries in bounded degree graphs [GR02:testing]. A number of properties with small separators are now known to be testable in a constant number of queries, such as minor closed properties [BSS10:minor, HKNO09:local], and hyperfinite properties [NS13:hyperfinite]. In particular, in the latter work it is proved that every property is constant-query testable in hyperfinite graphs. There are also constant-query properties that are closed under edge insertions, including -vertex connectivity [YI12:vertex], perfect matching [YYI12:constant], sparsity matroid [ITY12:matroid] and the supermodular-cut condition [TY15:supermodular]. Furthermore, there exist global monotone properties111A graph property is called monotone if it is closed under edge deletions. that contain expander graphs and can be tested with constant queries, including the property of being subdivision-free [KY13:subdivision]. There also exist some work on testable properties in some special classes of bounded degree graphs. For example, it is known that every hereditary property222A graph property is called hereditary if it is closed under vertex deletions. is testable with a constant number of queries in non-expanding -bounded graphs [CSS09:hereditary]. A property called -robust spectral property is constant-query testable in the class of high-girth graphs [CKSV17:spectra]. However, very little is known about characteristics of all testable properties in general.
1.1 Our Results
Although many properties are known to be constant-query testable in bounded degree graphs, our knowledge on characteristics of all testable properties is fairly restricted. One prominent example of testable properties is the family of hyperfinite properties [NS13:hyperfinite], which includes planar graphs and graphs that exclude any fixed minor (see e.g., [BSS10:minor, HKNO09:local]). For the statement of our results and the discussion of techniques, we state the definition of hyperfinite graphs at this place.
Let and . A graph with maximum degree bounded by is called -hyperfinite if one can remove at most edges from so that each connected component of the resulting graph has at most vertices. For a function , a graph is called -hyperfinite if is -hyperfinite for every . A set (or property) of graphs is called -hyperfinite if every graph in is -hyperfinite. A set (or property) of graphs is called hyperfinite if it is -hyperfinite for some function .
Also, many testable properties are known that are not hyperfinite. Our main result is that, nevertheless, for infinite properties the existence of an infinite set of hyperfinite graphs in the property is a necessary condition for its constant-query testability (finite properties are trivially hyperfinite). Since some of these testable properties, e.g., subdivision-freeness, contain expander graphs, a hyperfinite subproperty might seem somewhat surprising. (A subproperty of a property is a subset of graphs in that is also invariant under graph isomorphism.) Indeed, the complement of every non-trivially constant-query testable property also contains hyperfinite graphs, where a property is non-trivially testable if it is testable and there exists an such that there is an infinite number of graphs that are -far from .
Every constant-query testable property of bounded-degree graphs is either finite or contains an infinite hyperfinite subproperty. Also, the complement of every non-trivially constant-query testable graph property contains an infinite hyperfinite subproperty.
To our best knowledge, our theorem gives the first non-trivial result on the combinatorial structure of every constant-query testable property in bounded-degree graphs. A direct corollary from our main result is that expansion and the -clusterability property are not constant-query testable, as any hyperfinite graph will have many small subsets with small expansion and thus does not satisfy the properties. Indeed, a much stronger lower bound of on the query complexity for testing these two properties was already known prior to this work [GR00:expansion]. However, our result further implies that every infinite intersection of a family of expander graphs with any other property is also not testable.
Let be a property that does not contain an infinite hyperfinite subproperty, and let be an arbitrary property such that is an infinite set. Then, is not testable.
Note that in general, the intersection of a property that is not constant-query testable with another property may be testable. For example, the property of being planar and bipartite is testable since it is a hyperfinite property [NS13:hyperfinite]. However, bipartiteness is not constant-query testable [GR02:testing].
We then study the question whether a similar result can be obtained for expander or near-expander subproperties in testable non-hyperfinite properties. Expander graphs are those that are well connected everywhere, and thus can be thought as anti-hyperfinite graphs. Indeed, many known testable, while non-hyperfinite, properties do contain infinite expander subproperties. Typical examples include -connectivity, subgraph-freeness and subdivision-freeness. However, this turns out to not be the case in general. We show that there exists a testable property that is not hyperfinite and every graph in the property has distance to being an expander graph: The property consists of all graphs that have a connected component on vertices and all other vertices are isolated.
There exists an infinite graph property of bounded-degree graphs such that
is testable (with -sided error) with query complexity ,
is not hyperfinite,
every graph in differs in edges from every connected graph.
Motivated by the above result we also obtain a theorem (LABEL:thm:partitionoing) that shows that we can partition the set of vertices of every bounded degree graph into a constant number of subsets and a separator set, such that the separator set is small and the distribution of -discs on every subset of a partition class, is roughly the same as that of the partition class if the subset has small expansion.
1.2 Our Techniques
It is well known that constant-time property testing in the bounded-degree graph model is closely connected to the distribution of -disc isomorphism types (see, for example, [BSS10:minor, NS13:hyperfinite]). The -disc of is the rooted subgraph that is induced by all vertices at distance at most from and has root , i.e. the local subgraph that can be explored by running a BFS upto depth . Thus, the distribution of -disc isomorphism types describes the local structure of the graph. We then show (in Theorem 3.2) that every constant-query property tester can be turned into a canonical tester that is based on approximating the
-disc distribution and decides based on a net over the space of all distribution vectors. Technically, our proof for this result mostly follows an earlier construction of canonical testers introduced in[GR11:proximity] (see also [CPS16:testing, MMPS17]).
We then exploit a result by Alon [Lov12:large, Proposition 19.10] that is derived from open questions in graph limits theory. Alon proved that for every bounded-degree graph , there exists a graph of constant size whose -disc distribution can be made arbitrarily close (in terms of norm distance) to the -disc distribution of . Given a graph on vertices from some constant-query testable property we can use multiple copies of to obtain a graph that consists of connected components of constant size and whose distribution of -discs is close to that of . The latter implies that a canonical tester will behave similarly on and
and thus accepts with probability at least. Although does not necessarily have the tested property, it must be close to it. This implies that there exists a graph in from which we can remove edges to partition it into small connected components. Thus, is -hyperfinite, where is a constant depending on . However, may not be -hyperfinite for . The challenge is how to construct such a graph.
In order to do so, we proceed as follows. For every suitable choice of , we construct a series of -vertex graphs such that each approximately inherits the -hyperfinite properties of all graphs for all . The key idea is to maintain the hyperfinite properties of by causing only a small perturbation of its -disc vector. Carefully choosing the parameters of this process, at the end we obtain a graph that is -hyperfinite for a monotone function and every .
In order to show that we cannot obtain a similar result for expander graphs in non-hyperfinite properties, we have designed the aforementioned property of graphs which consist of a connected component on half of the vertices and all other vertices are isolated. Our proof of testability combines earlier ideas of testing connectivity with simple sampling based estimation of the number of isolated vertices.
1.3 Other Related Work
Goldreich and Ron [GR11:proximity] gave characterizations of the graph properties that have constant-query proximity-oblivious testers for bounded-degree graphs and for dense graphs. As noted in [GR11:proximity], such a class of properties is a rather restricted subset of the class of all constant-query testable properties. Hyperfiniteness is also closely related to graphings that have been investigated in the theory of graph limits [Ele07:note, Sch08:hyperfinite, Lov12:large].
Let be a graph with maximum degree bounded by , which is assumed to be a constant. We also call a -bounded graph.
A graph property is a set of graphs that is invariant under graph isomorphism. If all the graphs in have maximum degree upper bounded by , then we call a -bounded graph property. We let denote the set of graphs in with vertices. Note that .
Let denote the complement of , i.e., , where denotes the set of all -bounded graphs. Let denote the set of -vertex graphs that are not in , i.e., , where denotes the set of all -bounded -vertex graphs.
A subset is called a subproperty of if is invariant under graph isomorphism.
We have the following definition on graphs that are far from having some property.
Let be a -bounded graph property. An -vertex graph is said to be -far from having property if one has to modify more than edges to make it satisfy .
Let denote the set of all -vertex graphs that are -far from . Let be the set of all graphs that are -far from , i.e., .
Given a property , an algorithm is called a tester for , if it takes as input parameters , and has query access to the adjacency lists of an -vertex -bounded graph , and with probability at least , accepts if and rejects if . The following gives the definition of constant-query testable properties.
We call a -bounded graph property (constant-query) testable, if there exists a tester for that makes at most queries for some function that depends only on .
-Discs and frequency vectors.
The notions of -discs and frequency vectors play an important role for analyzing constant-query testable properties. For any vertex , we let denote the subgraph rooted at that is induced by all vertices that are at distance at most from . For any two rooted subgraphs , we say is isomorphic to , denoted by , if there exists a root-preserving mapping such that if and only if . Note that for constant , the total number of possible non-isomorphic -discs is also a constant, denoted by . Furthermore, we let be the set of all isomorphism types of -discs in any -bounded graph, where . Finally, we let denote the frequency vector of which is indexed by -disc types in such that
for any , i.e., denotes the fraction of vertices in whose -discs are isomorphic to . Furthermore, for any subset of , we let denote the vector that is indexed by types in such that
for any , i.e., denotes the fraction of vertices in whose -discs in are isomorphic to . Note that . If contains a single element , we write .
For any vector , we let denote its -norm. We have the following simple lemma on the -norm distance of the frequency vectors of two graphs that are -close to each other. The proof follows from the proof of Corollary 3 in [FPS15:constant], while we provide a proof here for the sake of completeness.
Let and . Let be -bounded graphs such that is -close to . Then, .
Let denote the set of edges that appear only in one of the two graphs . Since is -close to , it holds that . Note that for any , the total number of vertices that are within distance at most to either of its endpoint is at most . This further implies that the total number of vertices that may have different -disc types in and is at most . Finally, we note that each vertex with different -disc types in and contributes at most to the -norm distance of and , which implies that
This completes the proof of the lemma.
The converse to the above lemma is not true in general, that is, it is not true that the closeness of the frequency vectors of two graphs implies the closeness of these two graphs. However, Benjamini et al. [BSS10:minor] showed that the converse somehow still holds for hyperfinite graphs. More precisely, they proved the following result.
Lemma 2.5 (Theorem 2.2 from [BSS10:minor])
Let and . Let be the set of -hyperfinite -bounded graphs, and let be the set of -bounded graphs that are not -hyperfinite. Then it holds that for any graph and graph ,
Frequency preservers and blow-up graphs.
The following lemma is due to Alon, and it roughly says that for any -vertex -bounded graph, there always exists a “small” graph whose size is independent of that preserves the local structure well, i.e., its -disc frequencies.
Lemma 2.6 (Proposition 19.10 in [Lov12:large])
For any and , there exists a function such that for every -vertex graph , there exists a graph of size at most such that .
Definition 2.7 (-Dfp)
We call the small graph obtained from Lemma 2.6 a -disk frequency preserver (abbreviated as -DFP) of .
We remark that though we know the existence of the function that upper bounds the size of some -DFP, there is no known explicit bound on for arbitrary -bound graphs (see [FPS15:constant] for explicit bounds of for some special classes of graphs).
We use DFPs as a building block to construct -vertex graphs that have constant-size connected components and approximately preserve the -disc frequencies of a given -vertex graph . More precisely, we have the following definition.
Definition 2.8 (Blow-Up Graph)
Let , and let be a -bounded -vertex graph. Let be a -DFP of graph of size . Let be the -vertex graph that is composed of disjoint copies of and isolated vertices. We call the -blow-up graph of .
The following lemma follows directly from the above definition of blow-up graphs and the fact that the blow-up graph contains at most isolated vertices.
Let . Let . Let be any -bounded -vertex graph and let be the -blow-up graph of . We have .
Expansion and expander graphs.
Let be a -bounded graph. Let be a subset such that . The expansion or conductance of set is defined to be , where denotes the number of crossing edges from to . The expansion of is defined as . We call a -expander if . We simply call an expander if is a -expander for some universal constant .
3 Constant-Query Testable Properties and Hyperfinite Properties
In this section, we give the proof of main theorem, i.e., Theorem 1.2. We first give the necessary tools in Section 3.1, and then give the proof of the first part and second part of Theorem 1.2 in Section 3.2 and LABEL:sec:complement, respectively.
3.1 Basic Tools
The following is a direct corollary of Lemma 2.5 by Benjamini et al. [BSS10:minor].
Let . Let be a testable graph property. Suppose there exists a graph that is -hyperfinite. Then, every graph such that is -hyperfinite, where .
Our second tool is the following characterization of constant-query testable properties by the so-called canonical tester. Such a characterization is similar to the previous ones given in [GR11:proximity, CPS16:testing] for bounded-degree testable graph properties. The main difference here is that our canonical tester makes decisions based on the frequency vectors, instead of the forbidden subgraphs as considered in the previous work. We have the following theorem, whose proof is deferred to LABEL:sec:canonical_tester.
Theorem 3.2 (Canonical Tester)
Let be a graph property that can be tested with query complexity . Then there exists for some constant , such that for any , , , there exists a tester that
accepts any -bounded -vertex graph with probability at least , if ,
rejects any -bounded -vertex graph with probability at least , if .
The canonical tester has query complexity .
3.2 Infinite Testable Property Contain Infinite Hyperfinite Subproperties
We now prove the first part of Theorem 1.2, i. e., every infinite testable property contains an infinite hyperfinite subproperty.
We start by showing that for any fixed , and any graph in a testable property , we can find another graph such that is -hyperfinite and the frequency vectors of and are close.
Let . Let . Let . Let be a testable graph property with query complexity and let . Then, there exists such that
is -hyperfinite, and
where and for some constant .
Let , where are the numbers in the statements of Lemma 2.9 and Theorem 3.2, respectively. Let for the constant from Theorem 3.2. By definition, it holds that . Let be the -blow-up graph of . By Lemma 2.9 and our assumption that , it holds that , which implies that
as satisfies that and that .
Let be the canonical tester for with parameter with corresponding query complexity . Then by Theorem 3.2, will accept with probability at least . This implies that is -close to . Let such that is -close to . We claim that is the graph we are looking for.
First, we show that is -hyperfinite. Recall that by definition, is composed of disjoint copies of a graph of size and isolated vertices, where . This implies that is -hyperfinite. It follows that is -hyperfinite because we can remove at most edges from to obtain a graph of which all connected components have size at most .
Second, we prove that . Note that the bound given by inequality (1) implies
as and . Now since and are -close to each other, by Lemma 2.4, we have that
where the last inequality follows from our setting of parameters. The claim then follows by applying the triangle inequality. This completes the proof of the lemma.
The above lemma only guarantees that for every fixed , and graph , one can find a graph that is -hyperfinite (for and as in Lemma 3.3). However, we cannot directly use to construct an infinite hyperfinite subproperty. Recall that a set of graphs is called to be a hyperfinite property if there exists a function such that is -hyperfinite for every . Now, for any , we cannot guarantee that after removing edge from , one can obtain a graph that is the union of connected components of constant size. Furthermore, it is not guaranteed that if .
Our idea of overcoming the above difficulty is to start with the above hyperfinite graph for some fixed , and then iteratively construct a sequence of graphs with from . The constructed graph is guaranteed to inherit hyperfinite properties from . The key idea is to maintain the hyperfinite properties of by causing only a small perturbation of its -disc vector. Choosing the parameters in this process carefully, we can maintain these hyperfinite properties for the whole sequence of graphs. Now we give the details in the following lemma. Note that the first part of Theorem 1.2 follows from this lemma.
Let be an infinite -bounded graph property that is testable with query complexity . Then, there exists such that
is an infinite subproperty of , and
there exists a monotonically decreasing function such that is -hyperfinite for every .
Let be the set of sizes of graphs in . Since is an infinite graph property, it holds that is also an infinite set. We show there exists a monotonically decreasing function such that for each , we can find a graph that is -hyperfinite for every . This will imply that the set is an infinite -hyperfinite property, which will then prove the lemma.
Let us now fix an arbitrary and let be an arbitrary graph in . We let FindHyper() denote the graph that is obtained by applying Lemma 3.3 on with parameters . Now we construct as follows.
Let . Let and let . If , where is the number given in Lemma 3.3, then we simply let , which is a finite graph of size at most . In the following, we assume that .
Let . We start by applying Lemma 3.3 to with parameters , and to obtain a graph that is -hyperfinite, where and , .
We now iteratively construct a new -vertex graph from a graph that is -hyperfinite, where . Let