Property testing is mainly concerned with understanding the amount of information one needs to extract from an unknown input function to approximately determine whether the function satisfies a property or is far from satisfying it. In this paper, the types of functions we consider are strings ; images or matrices ; and edge-colored graphs , where the set of possible colors for each edge is . In all cases is a finite alphabet. Note that the usual notion of a graph corresponds to the special case where .
The systematic study of property testing was initiated by Rubinfeld and Sudan , and Goldreich, Goldwasser and Ron  were the first to study property testing of combinatorial structures. An -test for a property of functions is an algorithm that, given query access to an unknown input function , distinguishes with good probability (say, with probability ) between the case that satisfies and the case that is -far from ; the latter meaning that one needs to change the values of at least an -fraction of the entries of to make it satisfy . In an -vertex graph, for example, changing an -fraction of the representation means adding or removing edges. (The representation model we consider here for graphs is the adjacency matrix. This is known as the dense model.)
In many cases, such as that of visual properties of images (where the input is often noisy to some extent), it is more natural to consider a robust variant of tests, that is tolerant to noise in the input. Such tests were first considered by Parnas, Ron and Rubinfeld . A test is -tolerant for some if it distinguishes, with good probability, between inputs that are -far from satisfying and those that are -close to (i.e., not -far from) satisfying .
One of the main goals in property testing is to characterize properties in terms of the number of queries required by an optimal test for them. If a property has, for any , an -test that makes a constant number of queries, depending only on and not on the size of the input, then is said to be testable. is tolerantly testable if for any it has a constant-query -test for some . Finally, is estimable if it has a constant query -test for any choice of . In other words, is estimable if the distance of an input to satisfying can be estimated up to a constant error, with good probability, using a constant number of queries.
The meta-question that we consider in this paper is the following.
What makes a certain property testable, tolerantly testable, or estimable?
1.1 Previous works: Characterizations of graphs and hypergraphs
For graphs, it was shown by Fischer and Newman  that the above three notions are equivalent, i.e., any testable graph property is estimable (and thus trivially also tolerantly testable). A combinatorial characterization of the testable graph properties was obtained by Alon, Fischer, Newman and Shapira  and analytic characterizations were obtained independently by Borgs, Chayes, Lovász, Sós, Szegedy and Vesztergombi  and Lovász and Szegedy  through the study of graph limits. The combinatorial characterization relates testability with regular reducibility, meaning, roughly speaking, that a graph property is testable (or estimable) if and only if satisfying is equivalent to approximately having one of finitely many prescribed types of Szemerédi regular partitions . A formal definition of regular reducibility is given in Section 2.
Very recently, a similar characterization for hypergraphs was obtained by Joos, Kim, Kühn and Osthus , who proved that as in the graph case, testability, estimability and regular reducibility are equivalent for any hypergraph property.
A (partial) characterization of the graph properties that have a constant-query test whose error is one-sided (i.e., tests that always accept inputs satisfying ) was obtained by Alon and Shapira . They showed that the only properties testable using an important and natural type of one-sided tests, that are oblivious to the input size, are essentially the hereditary properties.
The above characterizations for graphs rely on a conversion of tests into canonical tests, due to Goldreich and Trevisan . A canonical test always behaves as follows: First it picks a set of vertices non-adaptively and uniformly at random in the input graph , and queries all pairs of these vertices, to get the induced subgraph . Then decides whether to accept or reject the input deterministically, based only on the identity of and the size of . The number of queries needed by the canonical test is only polynomial in the number of queries required by the original test, implying that any testable property is also canonically testable.
To summarize, all of the following conditions are equivalent for graphs: Testability, tolerant testability, canonical testability, estimability, and regular reducibility.
1.2 From unordered to ordered structures
Common to all of the above characterization results is the fact that they apply to unlabeled graphs and hypergraphs, which are unordered structures: Graph (and hypergraph) properties are symmetric in the sense that they are invariant under any relabeling (or equivalently, reordering) of the vertices. That is, if a labeled graph satisfies an unordered graph property , then any graph resulting from by changing the labels of the vertices is isomorphic to (as an unordered graph), and so it satisfies as well.
A natural question that one may ask is whether similar characterizations hold for the more general setting of ordered structures over a finite alphabet, such as images and vertex-ordered graphs in the two-dimensional case, and strings in the one-dimensional case. While an unordered property is defined as a family of (satisfying) instances that is closed under relabeling, in the ordered setting, any family of instances is considered a valid property. The ordered setting is indeed much more general than the unordered one, as best exemplified by string properties: On one hand, unordered string properties are essentially properties of distributions over the alphabet . On the other hand, any property of any finite discrete structure can be encoded as an ordered string property!
In general, the answer to the above question is negative. It is easy to construct simple string properties that are testable and even estimable, but are neither canonically testable nor regular reducible.111To this end, canonical tests in ordered structures are similar to their unordered counterparts, but they act in an order-preserving manner. For example, a -query test for a property of strings is canonical if, given an unknown string , the test picks entries , queries them to get the values , and decides whether to accept or reject the input only based on the tuple . Canonical tests in ordered graphs or images are defined similarly, but instead of querying a random substring, we query a random induced ordered subgraph or a random submatrix, respectively. As an example, consider the binary string property of “not containing three consecutive ones”. The following is an -test for (estimation is done similarly): Pick a random consecutive substring of the input, of length , and accept if and only if satisfies . On the other hand, global notions like canonical testability and regular reducibility cannot capture the local nature of . Moreover, it was shown by Fischer and Fortnow , building on ideas from probabilistically checkable proofs of proximity (PCPP), that there exist testable properties that are not tolerantly testable, as opposed to the situation in unordered graphs .
However, it may still be possible that a positive answer holds for the above question if we restrict our view to a class of “well behaved” properties.
Does there exist a class of properties that is wide enough to capture many interesting properties, yet well behaved enough to allow simple characterizations for testability?
So far, we have seen that in general, properties in which the exact location of entries is important to some extent, like and the property from , do not admit characterizations of testability that are similar to those of unordered graphs. But what about properties that are ultimately global? Can one find, say, an ordered graph property that is canonically testable but not estimable, for example? Stated differently,
Do the characterizations of testability in unordered graphs have analogues for canonical testability in ordered graphs and images?
1.3 Our contributions
In this paper, we provide a partial positive answer to the first question, and a more complete positive answer to the second question. For the second question, we show that canonical testability in ordered graphs and images implies estimability and is equivalent to (an ordered version of) regular reducibility, similarly to the case in unordered graphs. Addressing the first question, we identify a wide class of well-behaved properties of ordered structures, called the earthmover resilient (ER) properties, providing characterizations of tolerant testability and estimability for these properties.
Earthmover resilient properties
Roughly speaking, a property of a certain type of functions is earthmover resilient if slight changes in the order of the “base elements”222The base elements in an ordered graph are the vertices, and in images these are the rows and the columns; in strings the base elements are the entries themselves. of a function satisfying cannot turn into a function that is far from satisfying . The class of ER properties captures several types of interesting properties:
Trivially, all properties of unordered graphs and hypergraphs.
Global visual properties of images. In particular, this includes any property of black-white images satisfying the following: Any image satisfying has a sparse black-white boundary. This includes, as special cases, properties like convexity and being a half plane, which were previously investigated in [10, 11, 15, 16, 32]. See Subsection 2.1 for the precise definitions and statement and Appendix A for the proof.
All hereditary properties of ordered graphs and images, as implied by a recent result of Alon and the authors . While all hereditary unordered graph properties obviously fit under this category, it also includes interesting order-based properties, such as the widely investigated property of monotonicity (see [17, 18] for results on strings and images over a finite alphabet), -monotonicity , forbidden poset type problems , and more generally forbidden submatrix type problems [1, 2, 3, 23].
The new results
ER properties behave well enough to allow us to fully characterize the tolerantly testable properties among them in images and ordered graphs. In strings, it turns out that earthmover resilience is equivalent to canonical testability.
Our first result relates between earthmover resilience, tolerant testability and canonical testability in images and edge colored ordered graphs.
Theorem 1.1 (See also Theorem 2.2).
The following conditions are equivalent for any property of edge colored ordered graphs or images.
is earthmover resilient and tolerantly testable.
is canonically testable.
Theorem 2.2, which is the more detailed version of Theorem 1.1, also states that efficient tolerant -tests – in which the query complexity is polynomial in – can be converted, under certain conditions, into efficient canonical tests, and vice versa.
Let us note that Theorem 1.1
can be extended to high-dimensional ordered structures, such as tensors (e.g. 3D images) or edge colored ordered hypergraphs. As our focus in this paper is on one- and two-dimensional structures, the full proof of the extended statement is not given here, but it is a straightforward generalization of theproof.
In (one-dimensional) strings, it turns out that the tolerant testability condition of Theorem 1.1 is not needed. That is, ER and canonical testability are equivalent for string properties.
A string property is canonically testable if and only if it is earthmover resilient.
In the unordered graph case, it was shown that testability is equivalent to estimability  and to regular reducibility . Here, we establish analogous results for canonical tests in ordered structures. The notion of (ordered) regular reducibility that we use here is similar in spirit to the unordered variant, but is slightly more involved. The formal definition is given in Subsection 2.5.
Any canonically testable property of edge colored ordered graphs and images is (canonically) estimable.
A property of edge colored ordered graphs or images is canonically testable if and only if it is regular reducible.
The following conditions are equivalent for any earthmover resilient property of edge colored ordered graphs or images.
is tolerantly testable.
is canonically testable.
is regular reducible.
While the conversion between tolerant tests and canonical tests (and vice versa) among earthmover resilient properties has a reasonable polynomial blowup in the number of queries under certain conditions, for the relation between canonical testability and estimability or regular reducibility this is not known to be the case. The proofs of Theorems 1.3 and 1.4 go through Szemerédi-regularity type arguments, and thus yields at least a tower-type blowup in the number of queries. Currently, it is not known how to avoid this tower-type blowup in general, even for unordered graphs. However, interesting recent results of Hoppen, Kohayakawa, Lang, Lefmann and Stagni [27, 28] state that for hereditary properties of unordered graphs, the blowup between testability and estimability is at most exponential.
Alon and the authors  recently showed that any hereditary property of edge-colored ordered graphs and images is canonically testable, by proving an order-preserving removal lemma for all such properties. From Theorem 1.3 and  we derive the following very general result.
Any hereditary property of edge-colored ordered graphs or images is (canonically) estimable.
In particular, this re-proves the estimability of previously investigated properties such as monotonicity [17, 18] and more generally -monotonicity , and proves the estimability of forbidden-submatrix and forbidden-poset type properties [1, 2, 3, 22, 23].
The characterization of the one-sided error obliviously testable properties by Alon and Shapira , mentioned in Subsection 1.1, carries on to canonical tests in ordered graphs and images. That is, a property of such structures has a one-sided error oblivious canonical test if and only if it is (essentially) hereditary. The fact that hereditary properties are obliviously canonically testable with one-sided error is proved in ; the proof of the other direction is very similar to its analogue in unordered graphs , and is therefore omitted.
1.4 Related work
Canonical versus sample-based testing in strings
The notion of a sample-based test, already defined in the seminal work of Goldreich, Goldwasser and Ron , refers to tests that cannot choose which queries to make. A -query test for is sample-based if it receives pairs of the form where is the unknown input function and are picked uniformly at random from the domain of (compare this to the definition of canonical tests from Subsection 1.2). A recent work of Blais and Yoshida  characterizes the properties that have a constant query sample-based test.
In strings, sample-based testability might seem equivalent to canonical testability at first glance, but this is actually not the case, as sample-based tests have more power than canonical ones (canonical testability implies sample-based testability, but the converse is not true). Consider, e.g., the property of equality to the string , which is trivially sample-based testable, yet not canonically testable. Thus, sample-based testability does not imply canonical testability, so the results of Blais and Yoshida  are not directly comparable to Theorem 1.2 above.
Previously investigated properties of ordered structures
On top of the hereditary properties mentioned earlier, several different types of properties of ordered structures have been investigated in the property testing literature. Without trying to be comprehensive, here is a short summary of some of these types of properties.
Geometric & visual properties
Image properties that exhibit natural visual conditions, such as connectivity, convexity and being a half plane, were considered e.g. in [10, 11, 16, 32]. Typically in these cases, images with two colors – black and white – are considered, where the “shape” consists of all black pixels, and the “background” consists of all white pixels. For example, convexity simply means that the black shape is convex. As we shall see, some of these properties that are global in nature, such as convexity and being a half plane, are ER, while connectivity – a property that is sensitive to local modifications – is not ER.
String properties related to low-degree polynomials, PCPs and locally testable error correcting codes have been thoroughly investigated, starting with the seminal papers of Rubinfeld and Sudan  and Goldreich and Sudan . As shown in , there exist properties of this type that are testable but not tolerantly testable. In this sense, algebraic properties behave very differently from unordered graph properties. This should not come as a surprise: In a PCP or a code, the exact location of each bit is majorly influential on its “role”. This kind of properties is therefore not ER in general.
These are image properties where one can completely determine whether a given image satisfies based only on the statistics of the consecutive sub-images of , for a fixed constant . Recently, Ben-Eliezer, Korman and Reichman  observed that for almost all (large enough) patterns , the local property of not containing a consecutive copy of in the image is tolerantly testable. Note that monotonicity can also be represented as a local property, taking (but -monotonicity cannot be represented this way). Local properties are not ER in general, and obtaining characterizations of testability for them remains an intriguing open problem.
This Section contains all required definitions, including those that are related to earthmover resilience (Subsection 2.1), a discussion on earthmover resilient properties (Subsection 2.2), property testing notation (Subsection 2.3), and finally, the definition of ordered regular reducibility (Subsection 2.5). Along the way, we state the full version of Theorem 1.1 (Subsection 2.4).
We start with some standard definitions. A property of functions is simply viewed as a collection of such functions, where is said to satisfy if . The absolute Hamming distance between two functions is , and the relative distance is ; note that always holds. and are -far if , and -close otherwise. The distance of to a property is . is -far from if the distance between and is larger than , and -close to otherwise.
Representing images using ordered graphs
An image can be represented by an edge colored ordered graph , where can be thought of as a special “no edge” symbol. is defined as follows. for any pair satisfying (“pair of rows”) or (“pair of columns”); and for any . From now onwards, we almost exclusively use this representation of images as ordered graphs, usually giving our definitions and proofs only for strings and ordered graphs. It is not hard to verify that all results established for ordered graphs can be translated to images through this representation.
2.1 Earthmover resilience
We now formalize our notion of being “well behaved”. As both strings and ordered graphs are essentially functions of the form (for and , respectively), we simplify the presentation by giving here the general definition for functions of this type.
Definition (Earthmover distance).
Fix and let . A basic move between consecutive elements in is the operation of swapping and in . Formally, let be the permutation satisfying , , and for any . For any , define . The result of a basic move between and in is the composition .
The absolute earthmover distance between two functions is the minimum number of basic move operations needed to produce from . The distance is defined to be if cannot be obtained from using any number of basic moves. The normalized earthmover distance between and is , and we say that they are -earthmover-far if , and -earthmover-close otherwise.
Definition (Earthmover resilience).
Fix a function . A property is -earthmover resilient if for any , function satisfying , and function which is -earthmover-close to , it holds that is -close to (in the usual Hamming distance). is earthmover resilient if it is -earthmover resilient for some choice of .
Intuitively, a property is earthmover resilient if it is insensitive to local changes in the order of the base elements.
Hereditary properties are earthmover resilient
It was shown in  that any hereditary property satisfies a removal lemma: If an ordered graph (or image) is -far from an hereditary property , then contains ordered copies of some -vertex subgraph not satisfying , for suitable choices of and . Since one basic move can destroy no more than such -copies (those that include both swapped vertices), one has to make at least basic moves to make satisfy . Thus, -farness implies -earthmover-farness from .
2.2 Earthmover resilience in visual properties
Convexity and being a half plane are earthmover resilient. This is a special case of a much wider phenomenon concerning properties of black-white images in which the number of pixels lying in the boundary between the black shape and the white background is small. Here, an white/black image is represented by a -matrix of the same dimensions, where the -pixel of the image is black if and only if . The definition below is given for square images, but can be easily generalized to images with .
Definition (Sparse boundary).
The boundary of an black-white image is the set of all pixels in that are black and have a white neighbor.333Here, two pixels are neighbors if they share one coordinate and differ by one in the other coordinate. An alternative definition (that will yield the same results in our case) is that two pixels are neighbors if they differ by at most one in each of the coordinates, and are not equal. is -sparse for a constant if . A property has a -sparse boundary if the boundaries of all images satisfying are -sparse.
For example, for any property of images such that the black area in any image satisfying is the union of at most convex shapes (that do not have to be disjoint), has a -sparse boundary. This follows from the fact that the boundary of each of the black shapes is of size at most . For , this captures both convexity and being a half plane as special cases. The following result states that -sparse properties are earthmover resilient.
Fix . Then any property with a -sparse boundary is -earthmover-resilient, where for some absolute constant and any .
The result still holds if is taken as a function of . The (non-trivial) proof serves as a good example showing how to prove earthmover resilience of properties, and is given in Appendix A.
Naturally, not all properties of interest are earthmover resilient. For example, the local property of “not containing two consecutive horizontal black pixels” in a black/white image is not earthmover resilient: Consider the chessboard image, which satisfies , but by partitioning the board into quadruples of consecutive columns and switching between the second and the third column in each quadruple, we get an image that is -earthmover-close to yet -far from it in Hamming distance. A similar but slightly more complicated example shows that connectivity is not earthmover resilient as well.
2.3 Definitions: Testing and estimation
A -query algorithm is said to be an -test for with confidence , if it acts as follows. Given an unknown input function (where and are known), picks elements of its choice, and queries the values .444 as defined here is a non-adaptive test, that chooses which queries to make in advance. Adaptivity does not matter for our discussion, since we are only interested in constant-query tests, and since an adaptive test making a constant number of queries can be turned into a non-adaptive one making queries, which is still a constant. Then decides whether to accept or reject , so that
If satisfies then accepts with probability at least .
If is -far from , then rejects it with probability at least .
Now let be a function that satisfies for any . An -tolerant test is defined similarly to an -test, with the first condition replaced with the following strengthening: If is -close to , then accepts it with probability at least . Unless stated otherwise, the default choice for the confidence is . is testable if it has a constant-query -test (whose number of queries depends only on ) for any . Similarly, is -tolerantly testable, for a valid choice of , if it has a constant query -test for any . If is -tolerantly testable for some valid choice of , we say that it is tolerantly testable. Finally, is estimable if it is -tolerantly testable for any valid choice of .
Next, we formally define what it means for a test (or a tolerant test) to be canonical, starting with the definition for strings.
A -query test (or tolerant test) for a property of strings is canonical if it acts in two steps. First, it picks uniformly at random, and queries the entries . The second step only receives the ordered tuple and decides (possibly probabilistically) whether to accept or reject only based on the values of . Note that the second step does not “know” the values of themselves. As before, is canonically testable if it has a -query canonical test for any , where depends only on .
In contrast, a test for string properties is sample based if it has the exact same first step, but the second step receives more information: It also receives the values of . A sample-based test is more powerful than a canonical test in general. For example, the property of “being equal to the string ” is trivially sample-based -testable with queries, but is not canonically testable with a constant number of queries (that depends only on ).
For ordered graphs , a test (or a tolerant test) is canonical if, again, it acts in two steps. In the first step, picks vertices uniformly at random, and queries all values . The second step receives the ordered tuple , and decides (possibly probabilistically) whether to accept or reject only based on the value of .
We take a short detour to explain why asking to make a deterministic decision in the second step of the canonical test, rather than a probabilistic one, will not make an essential difference for our purposes. It was proved by Goldreich and Trevisan  that any probabilistic canonical test (for which the decision to accept or reject in the second step is not necessarily deterministic) can be converted into a deterministic one, with a blowup that is at most polynomial in the number of queries. The proof was given for unordered graph properties, but it can be translated to ordered structures like strings, ordered graphs and images in a straightforward manner. Thus, the requirement that the canonical test makes a deterministic decision is not restrictive.
2.4 The full statement of Theorem 1.1
We are finally ready to present the more precise version of Theorem 1.1. This version depicts an efficient transformation from earthmover resilience and tolerant testability to canonical testability, and vice versa.
Let be a property of edge-colored ordered graphs or images, and let and such that for any .
If is -earthmover resilient and -tolerantly testable, where the number of queries of a corresponding -tolerant non-adaptive test is denoted by , then is canonically testable. Moreover, if , and are polynomial in , then the number of queries of the canonical -test is also polynomial in .
If is canonically testable, where the number of queries of the canonical (non adaptive) -test is denoted by , then is both -earthmover resilient and -tolerantly testable where depends only on and . Moreover, if is polynomial in , then is polynomial in .
2.5 Regular reducibility
The last notion to be formally defined is that of ordered regular reducibility. This notion is a natural analogue of the unordered variant, and is rather complicated to describe and define. Since the intuition behind this definition is quite similar to that of the unordered case, we refer the reader to a more thorough discussion on regular reducibility (and the relation to Szemerédi’s regularity lemma) in . Here, we only provide the set of definitions required for our purposes.
Definition (Regularity, regular partition).
Let be an edge-colored ordered graph. For any , the -density of a disjoint pair is . A pair is -regular if for any two subsets and satisfying and , and any , it holds that . An equipartition of into parts is -regular if all but at most of the pairs are -regular.
Definition (Interval partitions).
The -interval equipartition of is the unique partition of into sets , such that for any and for any . An interval partition of an ordered graph or a string is defined similarly.
Definition (Ordered regularity instance).
An ordered regularity instance for -colored ordered graphs is given by an error parameter , integers , a set of densities indexed by , and , and a set of tuples of size at most . An ordered graph satisfies the regularity instance if there is an equitable refinement of the -interval equipartition where for any and , such that for all the pair is -regular and satisfies for any . The complexity of the regularity instance is .
With some abuse of notation, when writing we mean that the number of -colored edges between and is or . This way we avoid divisibility issues, without affecting any of our arguments.
The definition of an ordered regularity instance differs slightly from the analogous definition for unordered graphs in : Here we insist that the regular partition will be a refinement of an interval equipartition, disregarding pairs of parts inside the same interval. We also allow a color set of size bigger than two. The definition of regular reducibility is analogous to the unordered case, though obviously the regularity instances used in the definition are of the ordered type.
Definition (Regular reducible).
An edge-colored ordered graph property is regular-reducible if for any there exists such that for any there is a family of at most regularity instances, each of complexity at most , such that the following holds for every and ordered graph :
If satisfies then for some , is -close to satisfying .
If is -far from satisfying , then for any , is -far from satisfying .
3 Proof outline
In this section, we shortly describe the main ingredients of our proofs.
Earthmover distance and mixingness
Suppose that are two ordered graphs with a finite earthmover distance between them (all results mentioned here also apply for strings). In this case, and are isomorphic as unordered graphs, meaning that the collection of vertex permutations that “turn” into is not empty. We define the (absolute) mixingness between and as the minimal number of pairs such that , over all possible choice of from the collection. We show, via a simple inductive proof, that the mixingness between and is exactly equal to the earthmover distance between them.
With the tool of mixingness in hand, it is not hard to prove that canonical testability implies earthmover resilience and tolerant testability. The basic idea is that, if two graphs and are sufficiently close in terms of mixingness, then the distributions of their -vertex subgraphs are very similar, and so a -query canonical test cannot distinguish between them with good probability. See Section 5 for more details.
Earthmover resilience to piecewise-canonical testability
A test is piecewise-canonical if it acts in the following manner on the -interval partition of the unknown input graph (or string). First, chooses how many vertices (entries, respectively) to take from each interval, where the number of vertices may differ between different intervals. Then picks the vertices (entries) from the intervals in a uniformly random manner. Finally, queries precisely all pairs of picked vertices (or all entries, in the string case), and decides whether to accept or reject based on the ordered tuple of the values returned by the queries.
For strings of length over , if is earthmover resilient then it is also piecewise-canonically testable. The main idea of the proof is the following. If one takes a string and partitions it into sufficiently many equitable interval parts , then “shuffling” entries inside each of the interval parts will not change the distance of to significantly. With this idea in hand, it is not hard to observe that knowing the histograms of all parts (with respect to letters in ) is enough to estimate the distance of to up to a small additive constant error. These histograms cannot be computed exactly with a constant number of queries, but it is well known that each can be estimated up to a small constant error with a constant number of queries, which is enough for our purposes.
For properties of ordered graphs (or images), earthmover resilience by itself is not enough to imply piecewise-canonical testability, but earthmover resilience and tolerant testability are already enough. The idea is somewhat similar to the one we used for strings. We may assume that has a tolerant test whose set of queried pairs is always an induced subgraph of . Like before, we partition our input graph into sufficiently many interval parts . Now the piecewise canonical test simulates a run of the original tolerant test (without making the actual queries that decided on). Denote the vertices that decides to pick in by . picks exactly vertices uniformly at random in each part , and queries all edges between all chosen vertices. Now randomly “assigns” the labels to the vertices that it queried from , and returns the same answer that would have returned for this set of queries. It can be shown that is a test whose probability to return the same answer as is high, as desired. For the full details, see Section 6.
Piecewise-canonical testability to canonical testability
We describe the transformation for ordered graph properties; for strings this is very similar. Let be piecewise-canonical test for that partitions the input into intervals . Consider the following canonical test : picks vertices uniformly at random, for large enough . Then partitions the vertices into intervals . Now simulates a run of . If chose to take vertices from , then picks exactly vertices from . Finally, queries all edges between all vertices it picked, and returns the same answer as (where the simulation of assumes here that the vertices that were actually picked from come from ).
Canonical testability, estimability and regular reducibility
The proofs of Theorems 1.3 and 1.4 are technically involved. Fortunately, the proofs follow the same spirit as those of the unordered case, considered in [4, 21], and in this paper we only describe how to adapt the unordered proofs to our case.
Sections 8 and 9 contain the proofs of Theorems 1.3 and 1.4, respectively. It is shown in these sections that for our ordered case, in some sense it is enough to make the proofs work for -partite graphs, for a fixed . The intuition is that for our purposes, it is enough to view an ordered graph as a -partite graph (for a large enough constant ), where the parts are the intervals of a -interval partition of .
4 Discussion and open problems
The earthmover resilient properties showcase, among other phenomena, an interesting connection between visual properties of images and the regularity-based machinery that was previously used to investigate unordered graphs. We believe that further research on the characterization problem for ordered structures would be interesting. It might also be interesting to investigate such problems using distance functions that are not Hamming distance, as was done, e.g., in . Finally we present two open questions.
Characterization of testable earthmover-resilient properties
In this work we provide a characterization of earthmover resilient tolerantly testable properties. Although using such tests might make more sense than using intolerant tests in the presence of noise in the input (a situation that is common in areas like image processing, that are related to image property testing), it would also be very interesting to provide a characterization of the testable earthmover resilient properties. In particular, does there exist an earthmover resilient property that is testable but not tolerantly testable? The only known example of a (non earthmover resilient) property that is testable but not tolerantly testable is the PCPP-based property of , and it will certainly be interesting to find more examples of properties that have this type of behavior.
Alternative classes of properties
The class of earthmover resilient properties captures properties that are global in nature, and it will be interesting to identify and analyze some other wide classes of properties. A natural candidate is the class of all local properties . We also believe that it might be possible to find other interesting classes of visual properties.
5 Earthmover-resilience and mixing
Let and be two distributions over a finite family of combinatorial structures. The variation distance between and is .
The following folklore fact regarding the variation distance will be useful later.
Let and be two distributions over a finite family . Then .
An unordered isomorphism between two ordered graphs is a permutation such that for any .
Given a permutation of , the mixing set of is , its mixingness is and its normalized mixingness is . Given graphs and , their normalized mixingness is defined as the minimal normalized mixingness of an unordered isomorphism from to (and if and are not isomorphic as unordered graphs).
Our next goal is to show that the earthmover distance between two ordered graphs is equal to the mixingness between them. Given a permutation , a basic move for transforms it to a permutation of the same length, such that for some , and , and for any . Let denote the minimal number of basic moves required to turn into the identity permutation satisfying for any .
for any permutation .
The inequality is trivial: Any basic move changes the relative order between a (single) pair of entries in the permutation, and thus cannot decrease the size of the mixing set by more than one. Next we show by induction that . implies that and in this case. Now assume that and pick some such that . Take to be the largest for which – such an exists since . Note that due to the maximality of . Take to be the result of the basic move between and in . , and by the induction assumption we know that . But since is the result of a basic move on , we conclude that , as desired. ∎
The equivalence between the earthmover distance and the mixingness is now immediate.
For any two graphs , .
is the minimum value of among all unordered isomorphisms from to , and is the minimum value of among all such isomorphisms. By Lemma 5, these two values are equal, and thus the corresponding relative measures are also equal. ∎
Let and let be a -earthmover-resilient property. If two graphs satisfy for some , then .
Suppose that and satisfy . By definition, there exists an unordered isomorphism such that . Let be the graph in that is closest to (in Hamming distance). Consider the graph satisfying for any , then . Note that is an unordered isomorphism between and . It follows, building on Lemma 5, that . This implies (by the earthmover resilience) that is -close to . The triangle inequality concludes the proof. ∎
Canonical testability implies earthmover resilience
Let and be -edge-colored ordered graphs on and vertices respectively. The number of (ordered) copies of in , i.e., the number of induced subgraphs of of size isomorphic to , is denoted by . The density of in is (where if ). The -statistic of
is the vector, where is the family of all -edge-colored ordered graphs with vertices.
Every property of ordered graphs already testable by a canonical test is -earthmover-resilient for some (depending on the number of its query vertices as a function of ), as implied by the following lemma.
Let . For any canonical -test querying up to vertices and any two graphs and of either Hamming distance or earthmover distance at most , the difference between the acceptance probabilities of and of is at most .
We may assume that the test queries exactly vertices. For Hamming distance, the statement is well known, and follows easily by taking a union bound over all queried edges. Assume then that . Let be the -statistics of , respectively, where are two graphs with earthmover distance at most between them. By Lemma 5 it will be enough to show that . Lemma 5 implies that there is an unordered isomorphism with .
For any set of vertices, let , and note that is a bijective mapping from to itself. Observe that the induced subgraph can be non-isomorphic to (as an ordered graph on vertices) only if there exist two vertices satisfying . By a union bound, the probability of a uniformly random to have such a pair is at most , implying that . ∎
Let be an ordered graph property. Suppose that has a canonical -test making vertex queries for any . Then is -earthmover-resilient and -tolerantly -testable with vertex queries, where for any .
Let , and suppose that and are of earthmover distance at most between them, where satisfies ; to prove the earthmover resilience, we need to show that is -close to satisfying . Since , it is accepted by with probability at least . By Lemma 5, the acceptance probability of by is at least . Since rejects any graph -far from with probability at least , we conclude that must be -close to .
For the second part, regarding tolerant testability, Lemma 5 implies that for any graph that is -close to satisfying , the acceptance probability of is at least . By applying independently times and accepting if and only if the majority of the runs accepted, we get a test that accepts -close graphs with probability at least and rejects -far graphs with probability at least as well. This test can be made canonical with no need for additional queries. ∎
Let us finish with two comments. First, in the last two lemmas it was implicitly assumed that the canonical test is a deterministic one, but they also hold for randomized ones: The fact that in Lemma 5 is actually enough to imply the statement of Lemma 5 for any (deterministic or randomized) canonical test, and Lemma 5 follows accordingly.
Second, the results in this section, along with Sections 6 and 7, are not exclusive to two-dimensional structures, and naturally generalize to -dimensional structures for any . Thus, in ordered hypergraphs and tensors in three dimensions or more, it is still true that the combination of earthmover resilience and tolerant testability is equivalent to canonical testability.
6 Piecewise-canonical testability
In this section, we show that ER string properties and ER tolerantly testable ordered graph properties have a constant-query piecewise canonical test. This is a test that consider a -interval partition of the input, picking a predetermined number of vertices (or entries, in the string case) uniformly at random from each interval (this number may differ between different intervals), and finally, queries all edges between the picked vertices from all intervals. We always assume that our tolerant tests are non-adaptive and based on
query vertices (we assume they query the entire induced subgraph even if they do not use all of it). Note that unlike the case of unordered graphs, the move from an adaptive test to a non-adaptive one can cause an exponential blowup in the query complexity (we may need to “unroll” the entire decision tree).
A (probabilistic) piecewise-canonical test with parts and query vertices for a property of functions works as follows. First, the test non-adaptively selects (possibly non-deterministically) numbers that sum up to , and then it considers a -interval partition of the input function , selecting a uniformly random set of vertices from for every . The test finally accepts or rejects based only on the selected numbers and the unique function that is isomorphic (in the ordered sense) to the restriction of on the selected vertices.
A property is piecewise-testable if for for every there exist and for which has a piecewise canonical -test with parts and query vertices.
In Section 2 it was noted that a probabilistic canonical test for a property can be transformed into a deterministic one, with the same confidence, as was shown in . This is true for any choice of confidence (not only the “default” confidence ). Since one can always amplify a (probabilistic or deterministic) test to get a test of the same type with confidence arbitrarily close to , we conclude that if a property has a probabilistic canonical test with a certain confidence , then for any , has a deterministic canonical test with confidence at least .
All of the above is also true for piecewise-canonical tests; the proof for canonical tests carries over naturally to this case, so we omit it. Here, the simulating deterministic test has the same number of parts as the original test.
6.1 Strings: Earthmover resilience to piecewise-canonical testability
In this subsection, we prove that ER properties of strings are piecewise canonically testable. In Section 7, we show that the latter condition implies canonical testability.
For a string let denote the density of in . Let denote the distribution vector of letters in . The following well known fact is important for the proof.
The distribution vector of a string over can be approximated up to variation distance , with probability at least , using queries.
Fix a function , a -earthmover resilient property of strings over , and . Take . For any string over , let be the -interval partition of and let the -interval distribution denote the -tuple of the distribution vectors of . For as above and another string over with -interval partition , the -aggregated distance between and is ; recall that is the variation distance between and . As usual, we define . The next easy lemma relates between the Hamming distance and the -aggregated distance of to .
For any string over we have .
Let be the string that is closest to among those that can be generated from only using basic moves inside the intervals . In particular, it is trivial that and we know by Lemma 5 that . By Lemma 5, we get that . On the other hand, follows by the definitions of the distance functions and the minimality of . ∎
Finally we present the piecewise canonical test for . More accurately, we describe a piecewise-canonical algorithm that, given an unknown string over of an unknown length , approximates the -aggregated distance of to up to an additive error of , with probability at least . The test simply runs and accepts if and only if its output value is at most . The algorithm acts as follows. First, it runs the algorithm of Fact 6.2 in each interval of the -interval partition of , with parameters and . For any , let denote the distribution returned by this algorithm for interval . Then, Algorithm returns .
With probability , we get that for any . Suppose from now on that the latter happens. It follows from the triangle inequality for the variation distance that , where is the minimum defined above and is the string achieving this minimum. Conversely, there exists such that . But the minimality of implies that , and again, from the