Over the past decades, property testing has emerged as an important line of research in sublinear time algorithms. The goal is to understand randomized algorithms for approximate decision making, where the algorithm needs to decide (with high probability) whether a huge object has some property by making a few queries to the object. Many different types of objects and properties have been studied from this property testing perspective (see the surveys by Ron[Ron08, Ron10] and the recent textbook by Goldreich [Gol17] for overviews of contemporary property testing research). This paper deals with property testing of Boolean functions and property testing of graphs with vertex set .
In this paper we describe a new model of graph property testing, which we call the rejection sampling model. For and a subset of graphs on the vertex set , we say a graph on vertex set has property if and say is -far from having property if all graphs differ on at least edges111The distance definition can be modified accordingly when one considers bounded degree or sparse graphs. . The problem of -testing with rejection sampling queries is the following task:
Given some and access to an unknown graph , output “accept” with probability at least if has property , and output “reject” with probability at least if is -far from having property . The access to is given by the following oracle queries: given a query set , the oracle samples an edge uniformly at random and returns .
We measure the complexity of algorithms with rejection sampling queries by considering the sizes of the queries. The complexity of an algorithm making queries is .
The rejection sampling model allows us to study testers which rely on random sampling of edges, while providing the flexibility of making lower-cost queries. This type of query access strikes a delicate balance between simplicity and generality: queries are constrained enough for us to show high lower bounds, and at the same time, the flexibility of making queries allows us to reduce the rejection sampling model to Boolean function testing problems. Specifically, we reduce to tolerant junta testing and tolerant unateness testing (see Subsection 1.1).
Our main result in the rejection sampling model is regarding non-adaptive algorithms. These algorithms need to fix their queries in advance and are not allowed to depend on answers to previous queries (in the latter case we say that the algorithm is adaptive). We show a lower bound on the complexity of testing whether an unknown graph is bipartite using non-adaptive queries.
There exists a constant such that any non-adaptive -tester for bipartiteness in the rejection sampling model has cost 222We use the notations to hide polylogarithmic dependencies on the argument, i.e. for expressions of the form and respectively (for some absolute constant )..
More specifically, Theorem 1 follows from applying Yao’s principle to the following lemma.
Let be the uniform distribution over the union of two disjoint cliques of size
be the uniform distribution over the union of two disjoint cliques of size, and let be the uniform distribution over complete bipartite graphs with each part of size . Any deterministic non-adaptive algorithm that can distinguish between and with constant probability using rejection sampling queries, must have complexity .
We discuss a number of applications of the rejection sampling model (specifically, of Lemma 1.1) in the next subsection. In particular, we obtain new lower bounds in the tolerant testing framework introduced by Parnas, Ron, and Rubinfeld in [PRR06] for two well-studied properties of Boolean functions (specifically, -juntas and unateness; see the next subsection for definitions of these properties). These lower bounds are obtained by a reduction from the rejection sampling model; we show that too-good-to-be-true Boolean function testers for these properties imply the existence of rejection sampling algorithms which distinguish and with complexity. Therefore, we may view the rejection sampling model as a useful abstraction in studying the hard instances of tolerant testing -juntas and unateness.
1.1 Applications to Tolerant Testing: Juntas and Unateness
Given and a subset of -variable Boolean functions, a Boolean function has property if . The distance between Boolean functions is . The distance of to the property is . We say that is -close to if and is -far from if . The problem of tolerant property testing [PRR06] of asks for query-efficient randomized algorithms for the following task:
Given parameters and black-box query access to a Boolean function , accept with probability at least if is -close to and reject with probability at least if is -far from .
An algorithm which performs the above task is an -tolerant tester for . A -tolerant tester is a standard property tester or a non-tolerant tester. As noted in [PRR06], tolerant testing is not only a natural generalization, but is also very often the desirable attribute of testing algorithms. This motivates the high level question: how does the requirement of being tolerant affect the complexity of testing the properties studied? We make progress on this question by showing query-complexity separations for two well-studied properties of Boolean functions: -juntas, and unate functions.
(-junta) A function is a -junta if it depends on at most of its variables, i.e., there exists distinct indices and a -variable function where for all .
(unateness) A function is unate if is either non-increasing or non-decreasing in every variable. Namely, there exists a string such that the function is monotone with respect to the bit-wise partial order on .
The next theorem concerns non-adaptive tolerant testers for -juntas.
For any , there exists constants such that for any , any non-adaptive -tolerant -junta tester must make queries.
We give a noteworthy consequences of the Theorem 2. In [Bla08], Blais gave a non-adaptive -query tester for (non-tolerant) testing of -juntas, which was shown to be optimal for non-adaptive algorithms by Chen, Servedio, Tan, Waingarten and Xie in [CST17]. Combined with Theorem 2, this shows a polynomial separation in the query complexity of non-adaptive tolerant junta testing and non-adaptive junta testing.
The next two theorems concern tolerant testers for unateness.
There exists constants such that any (possibly adaptive) -tolerant unateness tester must make queries.
There exists constant such that any non-adaptive -tolerant unateness tester must make queries.
A similar separation in tolerant and non-tolerant testing occurs for the property of unateness as a consequence of Theorem 3 and Theorem 4. Recently, in [BCP17b], Baleshzar, Chakrabarty, Pallavoor, Raskhodnikova, and Seshadhri gave a non-adaptive -query tester for (non-tolerant) unateness testing, and Chen, Waingarten and Xie [CWX17a] gave an (adaptive) -query tester for (non-tolerant) unateness testing. We thus, conclude that by Theorem 3 and Theorem 4, tolerant unateness testing is polynomially harder than (non-tolerant) unateness testing, in both adaptive and non-adaptive settings.
1.2 Related Work
The properties of -juntas and unateness have received much attention in property testing research ([FKR04, CG04, Bla08, Bla09, BGSMdW13, STW15, CST17, BCE18] study -juntas, and [GGL00, KS16, CS16, BCP17b, CWX17a, CWX17b] study unateness). We briefly review the current state of affairs in (non-tolerant) -junta testing and unateness testing, and then discuss tolerant testing of Boolean functions and the rejection sampling model.
The problem of testing -juntas, introduced by Fischer, Kindler, Ron, Safra, and Samorodnitsky [FKR04], is now well understood up to poly-logarithmic factors. Chockler and Gutfreund [CG04] show that any tester for -juntas requires queries (for a constant ). Blais [Bla09] gave a junta tester that uses queries, matching the bound of [CG04] up to a factor of for constant . When restricted to non-adaptive algorithms, [FKR04] gave a non-adaptive tester making queries, which was subsequently improved in [Bla08] to . In terms of lower bounds,
Buhrman, Garcia-Soriano, Matsliah, and de Wolf [BGSMdW13] gave a lower bound for , and Servedio, Tan, and Wright [STW15] gave a lower bound which showed a separation between adaptive and non-adaptive algorithms for . These results were recently improved in [CST17] to , settling the non-adaptive query complexity of the problem up to poly-logarithmic factors.
The problem of testing unateness was introduced alongside the problem of testing monotonicity in Goldreich, Goldwasser, Lehman, Ron, and Samorodnitsky [GGL00], where they gave the first -query non-adaptive tester. Khot and Shinkar [KS16] gave the first improvement by giving a -query adaptive algorithm. A non-adaptive algorithm with queries was given in [CC16, BCP17b]. Recently, [CWX17a, BCP17a] show that queries are necessary for non-adaptive one-sided testers. Subsequently, [CWX17b] gave an adaptive algorithm testing unateness with query complexity . The current best lower bound for general adaptive testers appears in [CWX17a], where it was shown that any adaptive two-sided tester must use queries.
Once we consider tolerant testing, i.e., the case , the picture is not as clear. In the paper introducing tolerant testing, [PRR06] observed that standard algorithms whose queries are uniform (but not necessarily independent) are inherently tolerant to some extent. Nevertheless, achieving -tolerant testers for constants , can require applying different methods and techniques (see e.g, [GR05, PRR06, FN07, ACCL07, KS09, MR09, FR10, CGR13, BRY14, BMR16, Tel16]).
By applying the observation from [PRR06] to the unateness tester in [BCP17b], the tester accepts functions which are -close to unate with constant probability. We similarly obtain weak guarantees for tolerant testing of -juntas. Diakonikolas, Lee, Matulef, Onak, Rubinfeld, Servedio, and Wan [DLM07] observed that one of the (non-adaptive) junta testers from [FKR04] accepts functions that are -close to -juntas. Chakraborty, Fischer, Garcia-Soríano, and Matsliah [CFGM12] noted that the analysis of the junta tester of Blais [Bla09] implicitly implies an -query complexity tolerant tester which accepts functions that are -close to some -junta (for some constant ) and rejects functions that are -far from every -junta. Recently, Blais, Canonne, Eden, Levi and Ron [BCE18] showed that when required to distinguish between the cases that is -close to a -junta, or is -far from a -junta, queries suffice.
For general properties of Boolean functions, tolerant testing could be much harder than standard testing. Fischer and Fortnow [FF06] used PCPs in order to construct a property of Boolean functions which is -testable with a constant number of queries (depending on ), but any -tolerant test for requires queries for some . While [FF06] presents a strong separation between tolerant and non-tolerant testing, the complexity of tolerant testing of many natural properties remains open. We currently neither have a -query tester which -tests -juntas, nor a -query tester that -tests unateness or monotonicity when .
Testing graphs with rejection sampling queries.
Even though the problem of testing graphs with rejection sampling queries has not been previously studied, the model shares characteristics with previous studied frameworks. These include sample-based testing studied by Goldreich, Goldwasser, and Ron in [GGR98, GR16], where the oracle receives random samples from the input. One crucial difference between rejection sampling algorithms (which always query ) and sample-based testers is the fact that rejection sampling algorithms only receive positive examples (in the form of edges), as opposed to random positions in the adjacency matrix (which may be a negative example indicated the non-existence of an edge).
The rejection sampling model for graph testing also bears some resemblance to the conditional sampling framework for distribution testing introduced in Canonne, Ron, and Servedio, as well as Chakraborty, Fischer, Goldhirsh, and Matsliah [CRS15, CFGM16], where the algorithm specifies a query set and receives a sample conditioned on it lying in the query set.
1.3 Techniques and High Level Overview
We first give an overview of how the lower bound in the rejection sampling model (Lemma 1.1) implies lower bounds for tolerant testing of -juntas and unateness, and then we give an overview of how Lemma 1.1 is proved.
Reducing Boolean Function Testing to Rejection Sampling
This work should be considered alongside some recent works showing lower bounds for testing the properties of monotonicity, unateness, and juntas in the standard property testing model [BB16, CWX17a, CST17]. The lower bounds in [BB16, CWX17a] and [CST17] may be reinterpreted as following the same general paradigm. We discuss this general view next, followed by an overview of this work. At a high level, one may view the lower bounds from [BB16, CWX17a, CST17] as proceeding in three steps:
First, design a randomized indexing function that partitions the Boolean cube into roughly equal parts in a way compatible with the property (either monotonicity, unateness, or junta). We want to ensure that algorithms that make few queries cannot learn too much about , and that queries falling in the same part are close in Hamming distance.
Second, define two distributions over sub-functions for each . The hard functions are defined by , so that one distribution corresponds to functions with the property, and the other distribution corresponds to functions far from the property.
Third, show that any testing algorithm for the property is actually solving some algorithmic task (determined by the distributions of ) which is hard when queries are close in Hamming distance.
Belovs and Blais [BB16] used a construction of Talagrand [Tal96], known as the Talagrand function, to implement a randomized partition in a monotone fashion. The Talagrand function is a randomized DNF of monotone terms of size , and one may define to output the index of the first term of a Talagrand function which satisfies input . One can show that any two queries which are semi-balanced333We will say is semi-balanced if . with Hamming distance more than will fall in different parts with high probability. The sub-functions are then given by random dictators or random anti-dictators, so the algorithmic task is simple: determine whether the distribution over functions is supported on dictators or anti-dictators when queries in the same part are at distance at most from each other. An argument in the spirit of the one-sided error monotonicity lower bound from [FLN02] gives an lower bound for monotonicity testing. [CWX17a] further refined the idea by designing improved randomized partitions , which they called two-level Talagrand functions. The improved construction partitions in a monotone fashion, but has the property that queries which are semi-balanced with Hamming distance fall into different parts with high probability, thus bringing the lower bound to using the same algorithmic task as [BB16].
Higher lower bounds for unateness are possible because the unateness property allows for reductions to harder algorithmic tasks. Specifically, [CWX17a] consider the following algorithmic task: there are two classes of distributions supported on , and the task is to distinguish two classes with random samples. One class of distributions consists of the uniform distribution over , the other class of distributions is uniform over the support, but each satisfies the property that each has either or . Each sub-function is specified by a random sample of , where is a dictator in variable if was sampled, and an anti-dictator in variable if was sampled. The first key observation is that the distance of the functions from unateness, depends on whether comes from the first or second case. The second key observation is that multiple random samples are required to distinguish the two classes of distributions.444For example, in order to distinguish whether a distribution belongs to the first or second class with one-sided error, an algorithm must observe two samples and from , which would indicate that is uniform over the whole set . In fact, the adaptive algorithm for unateness testing in [CWX17b] can be interpreted as one based on solving this algorithmic task with a “rejection sampling”-style oracle.
For the case of -juntas, [CST17] used a simple indexing function that partitions according to projections on randomly chosen variables. The second and third step also follows the above strategy. In their case, they define the and (for Set-Size-Set-Queries and Set-Size-Element-Queries) problems as the hard algorithmic task, which give the lower bounds.
Our lower bounds for tolerant testing follow the same paradigm. For the randomized indexing function, we use the construction from [CST17] for the junta lower bound and a Talagrand-based construction (similar to [CWX17a], but somewhat simpler) for the unateness lower bounds. The hard algorithmic task we embed is distinguishing between the distributions and with access to a rejection sampling oracle.
At a high level, our reductions show that the class of functions which are close to -juntas and the class of functions which are close to unate have much richer structure than -juntas and unate functions. In particular, the distance of the functions drawn from our hard distributions to -junta and unateness will depend on a global parameter of an underlying graph used to define the functions555The relevant graph parameter in -juntas and unateness will be different. Luckily, both graph parameters will have gaps in their value depending on the distribution the graphs were drawn from (either or ). This allows us to reuse the work of proving Lemma 1.1 to obtain Theorem 2, Theorem 3, and Theorem 4.. Thus, tolerant testing algorithms for -juntas and unateness must explore the relationships between different variables to gain some information about the underlying graph. This lies in stark contrast to the algorithms of [Bla08], [CWX17b], and [BCP17b] which test -juntas (non-adaptively) and unateness, since these three algorithms treat the variables independently.
Distinguishing and with Rejection Sampling Queries
In order to prove Lemma 1.1, one needs to rule out any deterministic non-adaptive algorithm which distinguishes between and with rejection sampling queries of complexity . In order to keep the discussion at a high level, we identify three possible “strategies” for determining whether an underlying graph is a complete bipartite graph, or a union of two disjoint cliques:
One approach is for the algorithm to sample edges and consider the subgraph obtained from edges returned by the oracle. For instance, the algorithm may make all rejection sampling queries to be . These queries are expensive in the rejection sampling model, but they guarantee that an edge from the graph will be observed. If the algorithm is lucky, and there exists a triangle in the subgraph observed, the graph must not be bipartite, so it must come from .
Another sensible approach is for the algorithm to forget about the structure of the graph, and simply view the distribution on the edges generated by the randomness in the rejection sampling oracle as a distribution testing problem. Suppose for simplicity that the algorithm makes rejection sampling queries . Then, the corresponding distributions supported on edges from and will be -far from each other, so a distribution testing algorithm can be used.
A third, more subtle, approach is for the algorithm to use the fact that and correspond to a complete bipartite graph and the union of two cliques, and extract knowledge about the non-existence of edges when making queries which return either or a single vertex. More specifically, an algorithm may query a random subset of size . The subset will be split among the two sides of the graph (in the case of and ), and when an edge sampled by the oracle is incident on only one vertex of , the rejection sampling oracle will return this one vertex. At this point, the algorithm may extract some information about how is divided in the underlying graph, and eventually distinguish between and .
The three strategies mentioned above all fail to give rejection sampling algorithms. The first approach fails because with a budget of , rejection sampling algorithms will observe subgraphs which consist of various trees of size at most , thus we will not observe cycles. The second approach fails since the distributions are supported on edges, so distribution testing algorithms will require edges (which costs ) to distinguish between and . Finally, the third approach fails since algorithms will only observe responses from the oracle corresponding to lone vertices which will be split roughly evenly among the unknown parts of the graph, so these observations will not be enough to distinguish between and .
Our lower bound rules out the three strategies sketched above when the complexity is , and shows that if the above three strategies do not work (in any possible combination with each other as well), then no non-adaptive algorithm of complexity will work. The main technical challenge is to show that the above strategies are the only possible strategies to distinguish and . In Section 6, we give a more detailed, yet still high-level discussion of the proof of Lemma 1.1.
Finally, the analysis of Lemma 1.1 is tight; there is a non-adaptive rejection sampling algorithm which distinguishes and with complexity . The algorithm (based on the first approach mentioned above) is simple: make queries
, and if we observe an odd-length cycle, we output “”, otherwise, output “”.
We use boldfaced letters such as
to denote random variables. Given a stringand , we write to denote the string obtained from by flipping the -th coordinate. An edge along the -th direction in is a pair of strings with . In addition, for we use the notation to denote the string where the th coordinate is set to . Given and , we use to denote the projection of on . For a distribution we write to denote an element drawn according to the distribution. We sometimes write to denote .
Throughout this paper, we extensively use a generalization of Chernoff bounds for negatively correlated random variables.
Let be random variables. We say that are negatively correlated if for all the following hold:
Theorem 5 (Theorem from [Doe11]).
Let be negatively correlated binary random variables. Let and . Then, for ,
In addition, some of our proofs will use hyper-geometric random variables. Consider a population of size that consists of objects of a special type. Suppose objects are picked without replacement. Let be a random variable that counts the number of special objects picked in the sample. Then, we say that is a hyper-geometric random variable, and we denote . These hyper-geometric random variables enjoy tight concentration inequities (which are similar to Chernoff type bounds).
Theorem 6 ([Hoe63]).
Let and . Then for any
3 The Rejection Sampling Model
In this section, we define the rejection sampling model and the distributions over graphs we will use throughout this work. We define the rejection sampling model tailored to our specific application of proving Lemma 1.1.
Consider two distributions, and supported on graphs with vertex set . The problem of distinguishing and with a rejection sampling oracle aims to distinguish between the following two cases with a specific kind of query:
Cases: We have an unknown graph or .
Rejection Sampling Oracle: Each query is a subset ; an oracle samples an edge from uniformly at random, and the oracle returns . The complexity of a query is given by .
We say a non-adaptive algorithm Alg for this problem is a sequence of query sets , as well as a function . The algorithm sends each query to the oracle, and for each query , the oracle responds , which is either a single element of , an edge in , or . The algorithm succeeds if:
The complexity of Alg is measured by the sum of the complexity of the queries, so we let .
While our interest in this work is primarily on lower bounds for the rejection sampling model, an interesting direction is to explore upper bounds of various natural graph properties with rejection sampling queries. Our specific applications only require ruling out non-adaptive algorithms, but one may define adaptive algorithms in the rejection sampling model and study the power of adaptivity in this setting as well.
3.1 The Distributions and
Let and be two distributions supported on graphs with vertex set defined as follows. Let be a uniform random subset of size .
where for a subset , is the complete graph on vertices in and is the complete bipartite graph whose sides are and .
4 Tolerant Junta Testing
In this section, we will prove that distinguishing the two distributions and using a rejection sampling oracle reduces to distinguishing two distributions and over Boolean functions, where is supported on functions that are close to -juntas and is supported on functions that are far from any -junta with high probability.
4.1 High Level Overview
We start by providing some intuition of how our constructions and reduction implement the plan set forth in Subsection 1.3 for the property of being a -junta. We define two distributions supported on Boolean functions, and , so that functions in are -close to being -juntas and functions in are -far from being -juntas (where and are appropriately defined constants and ).
As mentioned in the introduction, our distributions are based on the indexing function used in [CST17]. We draw a uniform random subset of size and our function projects the points onto the variables in . Thus, it remains to define the sequence of functions .
We will sample a graph (in the case of ), and a graph (in the case of ) supported on vertices in . Each function is given by first sampling an edge and letting be a parity (or a negated parity) of the variables and . Thus, a function from or will have all variables being relevant, however, we will see that functions in have a group of variables which can be eliminated efficiently666We say that a variable is eliminated if we change the function to remove the dependence of the variable..
We think of the sub-functions defined with respect to edges from as implementing a sort of gadget: the gadget defined with respect to an edge will have the property that if eliminates the variable , it will be “encouraged” to eliminate variable as well. In fact, each time an edge is used to define a sub-function , any -junta where variable or is irrelevant will have to change half of the corresponding part indexed by . Intuitively, a function or (which originally depends on all variables) wants to eliminate its dependence of variables in order to become a -junta. When picks a variable to eliminate (since variables in are too expensive), it must change points in parts where the edge sampled is incident on . The key observation is that when needs to eliminate multiple variables, if picks the variables and to eliminate, whenever a part samples the edge , the function changes the points in one part and eliminates two variables. Thus, eliminates two variables by changing the same number of points when there are edges between and .
At a high level, the gadgets encourage the function to remove the dependence of variables within a group of edges, i.e., the closest -junta will correspond to a function which eliminates groups of variables with edges within each other and few outgoing edges. More specifically, if we wants to eliminate variables from , we must find a bisection of the graph whose cut value is small; in the case of , one of the cliques will have cut value 0, whereas any bisection of a graph from will have a high cut value, which makes functions in closer to -juntas than functions in .
The reduction from rejection sampling is straight-forward. We consider all queries which are indexed to the same part, and if two queries indexed to the same part differ on a variable , then we the algorithm “explores” direction . Each part where some query falls in has a corresponding rejection sampling query , which queries the variables explored by the Boolean function testing algorithm.
4.2 The Distributions and
The goal of this subsection is to define the two distributions and , supported over Boolean functions with variables. Functions will be close to being a -junta (for ) with high probability, and functions will be far from any -junta with high probability.
A function from is generated from a tuple of three random variables, , and we set . The tuple is drawn according to the following randomized procedure:
Sample a uniformly random subset of size . Let and be the function that maps to a number encoded by .
Sample of size uniformly at random, and consider the graph defined on vertices with , i.e., is a uniformly random graph drawn according to .
Define a sequence of functions drawn from a distribution . For each , we let .
For each , we will generate independently by sampling an edge uniformly at random, as well as a uniform random bit . We let
Using and , define for each .
A function drawn from is also generated by first drawing the tuple and setting . Both and are drawn using the same procedure; the only difference is that the graph , i.e., is a uniformly random graph drawn according to . Then is sampled from the modified graph .
Consider a fixed subset which satisfies , and a fixed subset which satisfies . Let be a graph defined over vertices in , and for any subsets , let
be the number of edges between sets and . Additionally, we let
be the minimum fraction of edges adjacent to a set of size at least . The following lemma relates the distance of a function where to being a -junta to . We then apply this lemma to the graph in and to show that functions in are -close to being -juntas, and functions in are -far from being -juntas.
Let be any graph defined over vertices in . If , where , then
with probability at least .
Proof: We first show that . Let with be the subset achieving the minimum in (1
), and consider the indicator random variablesfor defined as:
and note that the variables are independent and equal with probability . Consider the function is defined as:
Note that the function is a -junta, since only depends on variables in , and . In addition, we have that:
and by a Chernoff bound, we obtain the desired upper bound.
For the lower bound, let of size . We divide the proof into two cases: 1) , and 2) .
We handle the first case first, and let .
Suppose is the highest order bit of , so that and . For and , let , . For every ,
for some and . Thus, for at least half of all points in , . Therefore, for any function which does not depend on , for each where , either , or , thus,
Suppose is not the highest order bit of . Then, if , then . We note that for each and with , . Thus again, for any which does not depend on , , since half of all points satisfy .
Therefore, we may assume that . Again, consider the indicator random variables for given by
and by the definition of , we have that with probability at least . Suppose with and with with , then , which means that any function which does not depend on variables in , either or , thus, for all such functions ,
with probability by a Chernoff bound. Thus, we union bound over at most possible subsets with to conclude that with probability .
We have that has with probability , and that has with probability .
Proof: For the upper bound in , when , we have