Let be a distribution over a finite set , and be a property, that is, a set of distributions over . Given access to independent random samples from according to the distribution , we are interested in the problem of distinguishing whether the distribution is -close to having the property , or is -far from having the property , where and are two fixed proximity parameters such that . The distance of the distribution from the property is defined as , where denotes the -distance between distributions and §§§Strictly speaking it is an infimum, but since all properties we consider are compact sets, it is equal to the minimum.. The goal is to design a tester that uses as few samples as possible. For , this problem is referred to as the tolerant distribution testing problem of , and the particular case where is referred to as the non-tolerant distribution testing problem of . The sample complexity (tolerant and non-tolerant) is the number of samples required by the best algorithm that can distinguish with high probability (usually with probability at least ) whether the distribution is -close to having the property , or is -far from having the property .
While results and techniques from distribution testing are already interesting in their own right, they have also found numerous applications in various central problems in Theoretical Computer Science, and in particular in Property Testing. Goldreich and Ron [goldreich2011testing] used distribution testing for property testing in the bounded degree graph model. Fischer and Matsliah [fischer2008testing] employed identity testing of distributions for isomorphism testing of dense graphs. Recently, Goldreich [goldreich2019testing] used distribution testing for isomorphism testing in bounded degree graphs. Alon, Blais, Chakraborty, García-Soriano and Matsliah used distribution testing for function isomorphism testing [DBLP:journals/siamcomp/AlonBCGM13]. Distribution testing has found numerous applications in learning theory [ben2010theory, diakonikolas2017statistical, diakonikolas2016new]. It has also been studied in the differential privacy model [aliakbarpour2019private, gopi2020locally, zhang2021statistical, acharya2021inference]. Thus, understanding the tolerant and non-tolerant sample complexity of distribution testing is a central problem in theoretical computer science.
There have been extensive studies of non-tolerant and tolerant testing of some specific distribution properties like uniformity, identity to a fixed distribution, equivalence between two distributions and independence of a joint distribution
independence of a joint distribution[Batu00, batu2001testing, paninski2008coincidence, valiant2011testing, ValiantV11, valiant2017automatic]. Various other specific distribution properties have also been studied [batu2017generalized, diakonikolas2017sharp]. This paper proves general results about the gap between tolerant and non-tolerant distribution testing, that hold for large classes of properties.
1.1 Our results:
We now informally present our results. The formal definitions are presented in Section 2. We assume that the distributions are supported over a set . We first prove a result regarding label-invariant distribution properties (properties that are invariant under all permutations of ). We show that, for any label-invariant distribution property, there is at most a quadratic blowup in tolerant sample complexity as compared to the non-tolerant counterpart.
Theorem 1.1 (Informal).
Any label-invariant distribution property that can be non-tolerantly tested with samples, can also be tolerantly tested using samples, where is the size of the support of the distribution ¶¶¶ hides a poly-logarithmic factor..
This result gives a unified way for obtaining tolerant testers from their non-tolerant counterparts. The above result will be stated and proved formally in Section 3.
It is a natural question to investigate the extent to which the above theorem can be generalized. Though we are not resolving this question completely, as a first step in the direction of extending the above theorem for properties that are not necessarily label-invariant, we consider the notion of non-concentrated properties. By the notion of a non-concentrated distribution, intuitively, we mean that there is no significant portion of the support of the distribution that carries only a negligible weight, making the probability mass of the distribution well distributed among its indices. Specifically, any subset , for which is above some threshold (say with ), has probability mass of at least another threshold (say with ). A property is said to be non-concentrated if only non-concentrated distributions can satisfy the property. We prove a lower bound on the testing of any non-concentrated property (not necessarily label-invariant).
Theorem 1.2 (Informal).
In order to non-tolerantly test any non-concentrated distribution property, samples are required, where is the size of the support of the distribution.
The quadratic gap between tolerant testing and non-tolerant testing for any non-concentrated property follows from the above theorem, since by a folklore result, only many samples are required to learn any distribution approximately.
The proof of Theorem 1.2 for label-invariant non-concentrated properties is a generalization of the proof of the lower bound for classical uniformity testing, while for the whole theorem, that is, for the general (not label-invariant) non-concentrated properties, a more delicate argument is required. The formal proof is presented in Section 5.
The next natural question is: What is the sample complexity of any tolerant tester for non-concentrated properties. We address this question for label-invariant non-concentrated properties by proving the following theorem in Section 4.2. However, the question is left open for non-label-invariant properties.
Theorem 1.3 (Informal).
The sample complexity for tolerantly testing any non-concentrated label-invariant distribution property is , where is the size of the support of the distribution.
A natural question related to tolerant testing is:
How many samples are required to learn a distribution?
As pointed out earlier, any distribution can be learnt using samples. But what if the distribution is promised to be very concentrated? We present an upper bound result for learning a distribution, in which the sample complexity depends on the minimum cardinality of any set over which the unknown distribution is concentrated.
Theorem 1.4 (Informal).
To learn a distribution approximately, samples are enough, where is an unknown set of minimum cardinality whose mass is close to . Note that is also unknown, and the algorithm adapts to it.
1.2 Related works
Several forms of distribution testing have been investigated for over a hundred years in statistical theory [king1997guide, corder2014nonparametric]
, while combinatorial properties of distributions have been explored over the last two decades in Theoretical Computer Science, Machine Learning and Information Theory[goldreich2017introduction, MacKay2003, cover1999elements]. In Theoretical Computer Science, the investigation into testing properties of distributions started with the work of Goldreich and Ron [DBLP:journals/eccc/ECCC-TR00-020], even though it was not directly stated there in these terms. Batu, Fortnow, Rubinfeld, Smith and White [Batu00] formally initiated the study of property testing of distributions with the problem of equivalence testing ∥∥∥Given two unknown probability distributions that can be accessed via samples from their respective oracles, equivalence testing refers to the problem of distinguishing whether they are identical or far from each other. . Later, Batu, Fischer, Fortnow, Kumar, Rubinfeld and White [batu2001testing] studied the problems of identity and independence testing of distributions ******Given an unknown distribution accessible via samples, the problem of identity testing refers to the problem of distinguishing whether it is identical to a known distribution or far from it.. Since then there has been a flurry of interesting works in this model. For example, Paninski [paninski2008coincidence] proved tight bounds on uniformity testing, Valiant and Valiant [valiant2011testing] resolved the tolerant sample complexity for a large class of label-invariant properties that includes uniformity testing, Acharya, Daskalakis and Kamath [acharya2015optimal] proved various optimal testing results under several distance measures, and Valiant and Valiant [valiant2017automatic] studied the sample complexity of instance optimal identity testing. See the survey of Cannone [canonne2020survey] for a more exhaustive list.
While the most studied works concentrate on non-tolerant testing of distributions, a natural extension is to test such properties tolerantly. Since the introduction of tolerant testing in the pioneering work of Parnas, Ron and Rubinfeld [parnas2006tolerant], that defined this notion for classical (non-distribution) property testing, there have been several works in this framework. Note that it might be nontrivial to construct tolerant testers from their non-tolerant counterparts, for example, as in the case of tolerant junta testing [blais2019tolerant]. In a series of works, it has been proven that tolerant testing of the most natural distribution properties, like uniformity, requires almost linear number of samples [valiant2011testing, ValiantV11] ††††††To be precise, the exact lower bounds for non-tolerant uniformity testing is , and for tolerant uniformity testing, it is respectively, where is the support size of the distribution and the proximity parameter is constant.. Now a natural question arises about how the sampling complexity of tolerant testing is related to non-tolerant testing of distributions. To the best of our knowledge, there is no known example with more than a quadratic gap.
It would also be interesting to bound the gap for sample-based testing as defined in the work of Goldreich and Ron [goldreich2016sample]. This model was investigated further in the work of Fischer, Lachish and Vasudev [fischer2015trading], where a general upper bound for strongly testable properties was proved.
Organization of the paper.
We formally define various related notions in Section 2. We prove Theorem 1.1 in Section 3. For exposition purpose, we present the proof of Theorem 1.2 in two parts. In Section 4 we present the lower bound results for non-concentrated label-invariant properties - both the non-tolerant sample complexity (Theorem 1.2 for label-invariant properties) and the tolerant sample complexity (Theorem 1.3). An additional argument is required to prove Theorem 1.2 for the general setting, the details of which are presented in Section 5. We prove Theorem 1.4 in Section 6.
2 Notation and definitions
For a probability distribution over a universe , we refer to as the mass of in , where . For , the mass of is defined as . The support of a probability distribution on is denoted by . For an event , denotes the probability of the event . When we write , it suppresses a poly-logarithmic term in and the inverse of the proximity parameter(s). We subsume coefficients depending on the proximity parameters in our results for clarity of presentation.
The formal definitions of standard objects like distribution properties, label-invariant properties and -testers are presented in the Appendix A. Here we define the notions of non-concentrated distributions and non-concentrated properties.
Definition 2.1 (Non-Concentrated distribution).
A distribution over the domain is said to be -non-concentrated if for any set with size , the probability mass on is at least , where and are two parameters such that .
Definition 2.2 (Non-Concentrated Property).
Let . A distribution property is defined to be -non-concentrated, if all distributions in are -non-concentrated.
Note that the uniform distribution is -non-concentrated for every , and hence is the property of being identical to the uniform distribution. Also, for any such that is an integer, the uniform distribution is the only -non-concentrated one. Finally, observe that any arbitrary distribution is both -non-concentrated and -non-concentrated, for any .
3 Non-tolerant vs. tolerant sample complexities of label-invariant properties (Proof of Theorem 1.1)
We will prove that for any label-invariant property, the sample complexities of tolerant and non-tolerant testing are at most separated by a quadratic factor (ignoring some poly-logarithmic factors). Formally, the result is stated as follows:
Theorem 3.1 (Theorem 1.1 formalized).
Let be a label-invariant distribution property. If there exists an -tester (non-tolerant tester) for the property with sample complexity , where and , then for any with and , there exists a -tester (tolerant tester) that has sample complexity , where is the size of the support of the distribution ‡‡‡‡‡‡ here hides a poly-logarithmic term in and a polynomial term in and ..
Let us assume that is the unknown distribution. First note that if , then we can construct a distribution such that , by using samples from . Thereafter we can report to be close to the property if and only if is close to the property. In what follows, we discuss an algorithm with sample complexity when . Also, we assume that and are larger than some suitable constant. Otherwise, the theorem trivially follows.
The idea behind the proof is to classify the elements ofwith respect to their masses in into high and low, as formally defined below in Definition 3.2. In Lemma 3.3, we argue that it is not possible that there are two distributions and , such that they have identical masses for all high elements, while one is in property and the other one is far from property . Note that in Lemma 3.3, we need to be label-invariant. Using Lemma 3.3, we prove Lemma 3.4, that (informally) says that if two distributions are close with respect to high masses, then it is not possible that one distribution is close to while the other one is far from .
For a distribution over and , we define
Let be a label-invariant property that is -testable using samples. Let and be two distributions such that , and for all , the probability of is the same for both distributions, that is, . Then it is not possible that satisfies while is -far from satisfying .
Let be a permutation of , and be a label-invariant property that is -testable using samples. If and be two distributions such that
then the followings hold:
If is -close to , then there exists a distribution in such that
If is -far from and is a distribution such that
then the distribution does not satisfy the property .
3.1 Proof of Theorem 3.1
Let be the unknown distribution that we need to test and assume that , , and . We now provide a tolerant -tester, that is, a -tester for the property , as follows:
Draw many samples from the distribution . Let be the set of samples obtained.
Draw additional many samples
to estimate the value offor all ******Instead of two sets of random samples (first one to generate the set and the other one is the multi-set ), one can work with only one set of random samples. But in that case, the sample complexity becomes , as opposed to that we are going to prove..
Define a distribution such that, for ,
For each , if , remove from . Let be the resulting set (not multi-set). Note that .
Construct a distribution such that for all , . For each , we set *†*†*†Here any distribution such that for each would have been good enough for us. We explain below that it is in fact the case for the constructed .. Note that is well defined as and .
If there exists a distribution in and a permutation such that
then ACCEPT .
If there does not exist any in and that satisfy Equation (4), then REJECT .
The sample complexity of the tester is , which follows from the above description.
Correctness of the algorithm.
The correctness of our algorithm is divided to a sequence of lemmas. The first lemma is about how the distribution approximates .
For any , that is, any where , and hold with probability at least .
For any with , either , or holds with probability at least .
For , consider some such that . Note that . So, the probability that is at most . Applying Chernoff bound, we can show that does not hold with probability at most . So, we are done with . Note that also follows similarly from Chernoff bound. ∎
The following lemma is about how the distribution approximates . This follows from Lemma 3.5 along with the way that the algorithm constructs and distribution .
For any , that is, any where , and hold with probability at least ;
For any with , holds with probability at least .
For , note that . So, by Lemma 3.5 , and holds with probability at least . This implies that and hence with probability at least . As our algorithm assigns for each , we are done with the proof of .
For , by Lemma 3.5 (ii), , with probability at least . Note that . If , then , and we are done. If , from the way is assigned along with the fact that , .
Informally speaking, the following lemma establishes the fact that it is enough to bound the distance between the distributions and over to show that and satisfy Equation (1).
, with probability at least .
Applying Lemma 3.6 for each and using the union bound over all such (at most many such ), we get , with probability at least .
For , recall the way that our algorithm assigns for each . is less than (as ). So, for each , . Hence, . ∎
, with probability at least .
As for each , . Applying Lemma 3.6 for each with and then applying the union bound over all such (at most many ), we get that , that is, for all such , with probability at least . Recall that . So, for each , , with probability at least *‡*‡*‡Here we have crucially used the advantage of two sets of random samples: first one to generate the set and the other one is the multi-set . As only members of can be present in , applying the union bound over all with is good enough. If we use only one set of random samples, possibly we need to take many random samples.. Note that, for each , . Applying Lemma 3.6 for each and then using the union bound over all such , we get for all such , with probability at least .
So, putting everything together, the desired sum can be bounded with probability at least , as follows:
holds with probability at least .
From Lemma 3.4, we know that if is -close to , then there exists in satisfying Equation (2) (which is the same as Equation (4)), so the algorithm will ACCEPT distribution . Again from Lemma 3.4, we know that if is -far from , then there does not exist any distribution in satisfying Equation (3) (which is same as Equation (4)). Thus the algorithm will REJECT the distribution .
Note that the total failure probability of the algorithm is bounded by the probability that the distribution does not satisfy Equation (5), which is at most .
Proof of Lemma 3.3.
We will prove this by contradiction. Let us assume that there are two distributions and such that
is -far from ;
For all , .
Now, we argue that any -non-tolerant tester requires samples from the unknown distribution to distinguish whether is in the property or -far from it.
Let be an distribution obtained from by permuting the labels of using a uniformly random permutation. Specifically, consider a random permutation . The distribution is as follows:
for each and
for each .
Similarly, consider the distribution obtained from by permuting the labels of using a uniformly random permutation. Note that distribution is in , whereas is -far from , which follows from the property being label-invariant.
We will now prove that and provide similar distributions over sample sequences. More formally, we will prove that any algorithm that takes many samples, cannot distinguish from with probability at least . We argue that this claim holds even if the algorithm is provided with additional information about the input: Namely, for all , it is told the value of (which is the same as ). When the algorithm is provided with this information, it can ignore all samples obtained from .
By the definition of , for all , both and are at most . Let be a sequence of samples drawn according to . If , then with probability , the sequence has no element that appears twice. In other words, the set is a set of at most distinct elements from . Since the elements of were permuted using a uniformly random permutation, with probability , the sequence is a uniformly random sequence of distinct elements from . Similarly, if is a sequence of samples drawn according to , then with probability , the sequence is a uniformly random sequence of distinct elements from . Thus, the distributions over the received sample sequence obtained from or are of distance of each other.
Hence, if the algorithm obtains many samples from the unknown distribution , it cannot distinguish whether the samples are coming from or . ∎
Proof of Lemma 3.4.
First we consider a distribution that is -close to property . Let be the distribution satisfying such that . By the triangle inequality, the distribution satisfies Equation (2).
So there exists a distribution such that
and are the same on the heavy support with respect to , that is, and for all , .
By Lemma 3.3, we know that if satisfies property , then is -close to . As , is -close to . Hence the result follows by contradiction. ∎
4 Sample complexity of testing non-concentrated label-invariant properties
In this section we first prove a lower bound of on the sample complexity of non-tolerant testing of any non-concentrated label-invariant property. Then we proceed to prove a tolerant lower bound of samples for such properties in Section 4.2.
4.1 Non-tolerant lower bound (Proof of Theorem 1.2 for label-invariant properties)
Here we first prove a lower bound result analogous to Theorem 1.2 where the properties are non-concentrated and label-invariant. In Section 5, we discuss why the proof of Theorem 4.1 does not directly work for Theorem 1.2, and then prove Theorem 1.2 using a different argument.
Theorem 4.1 (Analogous result of Theorem 1.2 for non-concentrated label-invariant properties).
Let be any -non-concentrated label-invariant distribution property, where . For with , any -tester for property requires many samples, where is the size of the support of the distribution.
Let us first consider a distribution that satisfies the property. Since is an -non-concentrated property, by Definition 2.2, is an -non-concentrated distribution. From , we generate a distribution such that the support of is a subset of that of , and is -far from . Hence, if we apply a random permutation over the elements of , we show that and are indistinguishable, unless we query for many samples. Below we formally prove this idea.
We will partition the domain into two parts, depending on the probability mass of on the elements of . Given the distribution , let us first order the elements of according to their probability masses. In this ordering, let be the smallest elements of . We denote by . Before proceeding further, note that the following observation gives an upper bound on the probabilities of the elements in .
For all , .
Proof of Observation 4.2.
By contradiction, assume that there exists such that . This implies, for every , that . So,
As and is a -non-concentrated distribution, . Also, . Plugging these into the above inequality, we get a contradiction. ∎
Note that Observation 4.2 implies that if is a multi-set of samples from , then with probability , no element from appears in more than once. Now using the distribution and the set , let us define a distribution such that is -far from . Note that is a distribution that comes from a distribution over a set of distributions, all of which are not -non-concentrated. The distribution is generated using the following random process:
We partition randomly into two equal sets of size . Let the sets be and . We first pair the elements of randomly into pairs. Let be a random pairing of the elements in , which is represented as , that is, .
The probability mass of at is defined as follows:
If , then .
For every pair , , and .
We start by observing that the distribution constructed above is supported on a set of at most elements. So, any distribution constructed using the above procedure is -far from satisfying the property for any .
We will now prove that and both have similar distributions over the sequences of samples. More formally, we will prove that any algorithm that takes many samples, cannot distinguish between from with probability at least .
Since any produced using the above procedure has exactly the same probability mass on elements in as , any tester that distinguishes between and must rely on samples obtained from . Recall that the algorithm is given a uniformly random permutation of the distribution. Since (particularly, ), it is not possible to distinguish between and , unless an element of appears at least twice. Otherwise, as in the proof of Lemma 3.3, the elements drawn from are distributed identically to a uniformly random non-repeating sequence. But observe that and when is in . Thus any sequence of samples will provide only a distance of between the two distributions, completing the proof. ∎
4.2 Tolerant lower bound (Proof of Theorem 1.3)
Theorem 4.3 (Theorem 1.3 formalized).
Let be any -non-concentrated distribution property, where . For any and with , any -tester for requires samples, where is the size of the support of the distribution.
To prove the above theorem, we recall some notions and a theorem from Valiant’s paper on a lower bound for the sample complexity of tolerant testing of symmetric properties [valiant2011testing]. These definitions refer to invariants of distributions, which are essentially a generalization of properties.
Let denote a real-valued function over the set of all distributions over .
is said to be label-invariant if for any the following holds: for any permutation . Here is the probability distribution such that for every .
For any with and , is said to be -weakly-continuous if for all distributions satisfying , we have .
For a property of distributions, we define to be the distance of from the given property , where . From the triangle inequality property of distances, (which refers to the distance function from the property ) is -weakly continuous, for any .
Theorem 4.5 (Low Frequency Blindness [valiant2011testing]).
Consider a function that is label-invariant and -weakly-continuous, where and . Let there exist two distributions and in with being the size of their supports, such that , , and they are identical for any index occurring with probability at least in either distribution, where . Then any tester that has sample access to an unknown distribution and distinguishes between and , requires many samples from .
Now we are ready to prove Theorem 4.3.
Proof of Theorem 4.3.
Let us define with respect to property as follows:
As is a label-invariant property, function is also label-invariant. We have already noted that is -weakly continuous as "distance from a property" satisfies the triangle inequality, for any . Now recall the distributions and considered in the proof of Theorem 4.1. Note that is in and is -far from , for any , and both of them have a support size of . Here we take . Now, we apply Theorem 4.5 with , some and with . Observe that this completes the proof of Theorem 4.3. ∎
5 Sample complexity of non-concentrated properties (Proof of Theorem 1.2)
Theorem 5.1 (Theorem 1.2 formalized).
Let be any -non-concentrated distribution property. For any with