1.1 -wise uniformity and almost -wise uniformity
We say that a probability distribution over is -wise uniform if its marginal distribution on every subset of coordinates is the uniform distribution. For Fourier analysis of the Hamming cube, it is convenient to identify the distribution with its density function satisfying
We write to denote that
is a random variable drawn from the associated distribution with density:
for any . Then a well-known fact is that a distribution is -wise uniform if and only if the Fourier coefficient of is on every subset of size between and :
-wise uniformity is an essential tool in theoretical computer science. Its study dates back to work of Rao [Rao47]. He studied -wise uniform sets, which are special cases of -wise uniform distribution. A subset of is a -wise uniform set if the uniform distribution on this subset is -wise uniform. Rao gave constructions of a pairwise-uniform set of size (when for any integer ), a -wise uniform set of size (when for any integer ), and a lower bound (reproved in [ABI86, CGH85]) that a -wise uniform set on requires size at least . An alternative proof of the lower bound for even is shown in [AGM03] using a hypercontractivity-type technique, as opposed to the linear algebra method. Coding theorists have also heavily studied -wise uniformity, since MacWilliams and Sloane showed that linear codes with dual minimum distance correspond to -wise uniform sets in [MS77]. The importance in theoretical computer science of -wise independence for derandomization arose simultaneously in many papers, with [KW85, Lub86] emphasizing derandomization via the most common pairwise-uniformity case, and [ABI86, CGH85] emphasizing derandomization based on -wise independence more generally.
A distribution is “almost -wise uniform” if its marginal distribution on every coordinates is very close to the uniform distribution. Typically we say two distributions are -close, if the total variation distance between and is at most ; and we say they are -far, if the total variation distance between them is more than . However the precise notion of “close to uniform” has varied in previous work. Suppose is the density function for the marginal distribution of restricted to some specific coordinates and is the density function for the uniform distribution. Several standard ways are introduced in [AGM03, AAK07] to quantify closeness to uniformity, corresponding to the norms:
( norm): , where denotes total variation distance;
( norm): , where denotes the -divergence of from the uniform distribution;
( norm): , or in other words, for any ,
Note the following: First, closeness in norm is the most natural for algorithmic derandomization purposes: it tells us that the algorithm cannot tell is different from the uniform distribution up to error. Second, these definitions of closeness are in increasing order of strength. On the other hand, we also have that ; thus all of these notions are within a factor of . We generally consider to be constant (or at worst, ), so that these notions are roughly the same.
A fourth reasonable notion, proposed by Naor and Naor in [NN93], is that the distribution has a small bias over every non-empty subset of at most coordinates. We say density function is -wise uniform if for every non-empty set with size at most ,
Here we also have if and only if is exactly -wise uniform. Clearly if the marginal density of over every coordinates is -close to the uniform distribution in total variation distance, then is -wise uniform. On the other hand, if is -wise uniform, then the marginal density of over every coordinates is -close to uniform distribution in total variation distance. Again, if is considered constant, this bias notion is also roughly the same as previous notions. In the rest of paper we prefer this -wise uniform notion for “almost -wise uniform” because of its convenience for Fourier analysis.
The original paper about almost -wise uniformity, [NN93], was concerned with derandomization; e.g., they use -wise uniformity for derandomizing the “set balancing (discrepancy)” problem. Alon et al. gave a further discussion of the relationship between almost -wise uniformity and derandomization in [AGM03]. The key idea is the following: In many cases of randomized algorithms, the analysis only relies on the property that the random bits are -wise uniform, as opposed to fully uniform. Since there exists an efficiently samplable -wise uniform distribution on a set of size at most , one can reduce the number of random unbiased bits used in the algorithm down to . To further reduce the number of random bits used, a natural line of thinking is to consider distributions which are “almost -wise uniformity”. Alon et al. [AGHP92] showed that we can deterministically construct -wise uniform sets that are of size , much smaller than exact -wise uniform ones (roughly size). Therefore we can use substantially fewer random bits by taking random strings from an almost -wise uniform distribution.
However we need to ensure that the original analysis of the randomized algorithm still holds under the almost -wise uniform distribution. This is to say that if the randomized algorithm behaves well on a -wise uniform distribution, it may or may not also work as well with an -wise uniform distribution, when the parameter is small enough.
1.2 The Closeness Problem
For the analysis of derandomization, it would be very convenient if -wise uniformity – which means that “every -local view looks close to uniform” – implies global -closeness to -wise uniformity. A natural question that arises, posed in [AGM03], is the following:
How small can be such that the following is true? For every -wise uniform distribution on , is -close to some -wise uniform distribution.
In this paper, we will refer to this question as the Closeness Problem.
1.2.1 Previous work and applications
On one hand, the main message of [AGM03] is a lower bound: For every even constant , they gave an -wise uniform distribution with , yet which is -far from every -wise uniform distribution in total variation distance.
On the other hand, [AGM03] proved a very simple theorem that always holds. Despite its simplicity, this upper bound has been used many times in well known results.
One application is in circuit complexity. [AGM03]’s upper bound is used for fooling disjunctive normal formulas (DNF) [Baz09] and [Bra10]. In these works, once the authors showed that -wise uniformity suffices to fool DNF/, they deduced that -uniform distributions suffice, and hence -biased sets sufficed trivially. [AGM03]’s upper bound is also used as a tool for the construction of two-source extractors for a similar reason in [CZ16, Li16].
Another application is for hardness of constraint satisfactory problems (s). Austrin and Mossel [AM09] show that one can obtain integrality gaps and UGC-hardness for CSPs based on -wise uniform distributions of small support size. If a predicate is -wise uniform, Kothari et al. [KMOW17] showed that one can get SOS-hardness of refuting random instances of it when there are around constraints. Indeed, [KMOW17] shows that if we have a predicate that is -close to -wise uniform, then with roughly random constraints, SOS cannot refute that a -fraction of constraints are satisfiable. This also motivates studying -closeness to -wise uniformity and how it relates to Fourier coefficients. -closeness to -wise uniformity is also relevant for hardness of random , as shown in [AOW15].
Alon et al. [AAK07] investigated the Closeness Problem further by improving the upper bound to . Indeed, they showed a strictly stronger fact that a distribution is -close to some -wise uniform, where . Rubinfeld and Xie [RX13] generalized some of these results to non-uniform -wise independent distributions over larger product spaces.
Let us briefly summarize the method [AAK07] used to prove their upper bounds. Given an -wise uniform , they first try to generate a -wise uniform “pseudo-distribution” by forcing all Fourier coefficients at degree at most to be zero. It is a “pseudo-distribution” because some points might have negative density. After this, they use a fully uniform distribution and -wise uniform distributions with small support size to try to mend all points to be nonnegative. They bound the weight of these mending distributions to upper-bound the distance incurred by the mending process. This mending process uses the fully uniform distribution to mend the small negative weights and uses -wise uniform distributions with small support size to correct the large negative weights point by point. By optimizing the threshold between small and large weights it introduces a factor of .
Though they did not mention it explicitly, they also give a lower bound for the Closeness Problem of for by considering the uniform distribution on a set of random chosen strings. No previous work gave any lower bound for the most natural case of .
1.2.2 Our result
In this paper, we show sharper upper and lower bounds for the Closeness Problem, which are tight for even and . Comparing to the result in [AAK07], we get rid of the factor of .
Any density over is -close to some -wise uniform distribution, where
Consequently, if is -wise uniform, i.e., for every non-empty set with size at most , then
For the special case , the corresponding can be further improved to .
Our new technique is trying to mend the original distribution to be -wise uniform all at once. We want to show that some mixture distribution is -wise uniform with small mixture weight . The distance between the final mixture distribution and the original distribution is bounded by . Therefore we only need to show that the mending distribution exists for some small weight . Showing the existence of such a distribution
can be written as the feasibility of a linear program (LP). We upper boundby bounding the dual LP, using the hypercontractivity inequality.
Our result is sharp for all even , and is also sharp for . We state the matching lower bound for even : For any and even , and small enough , there exists some -wise uniform distribution over , such that is -far from every -wise uniform distribution in total variation distance, where
Our method for proving this lower bound is again LP duality. Our examples in the lower bound are symmetric distributions with Fourier weight only on level . The density functions then can be written as binary Krawtchouk polynomials which behave similar to Hermite polynomials when is large. Our dual LP bounds use various properties of Krawtchouk and Hermite polynomials.
Interestingly both our upper and lower bound utilize LP-duality, which we believe is the most natural way of looking at this problem.
We remark that we can derive a lower bound for oddfrom Theorem 1.2.2 trivially by replacing by . There exists a gap of between the resulting upper and lower bounds for odd . We believe that the lower bound is tight, and the upper bound may be improvable by a factor of , as it is in the special case . We leave it as a conjecture for further work: Suppose the distribution over is -wise uniform. Then is -close to some -wise uniform distribution in total variation distance, where
1.3 The Testing Problem
Another application of the Closeness Problem is to property testing of -wise uniformity. Suppose we have sample access from an unknown and arbitrary distribution; we may wonder whether the distribution has a certain property. This question has received tremendous attention in the field of statistics. The main goal in the study of property testing is to design algorithms that use as few samples as possible, and to establish lower bound matching these sample-efficient algorithms. In particular, we consider the property of being -wise uniform:
Given sample access to an unknown and arbitrary distribution on , how many samples do we need to distinguish between the case that is -wise uniform versus the case that is -far from every -wise uniform distribution?
In this paper, we will refer to this question as the Testing Problem.
We say a testing algorithm is a -tester for -wise uniformity if the algorithm outputs “Yes” with high probability when the distribution is -wise uniform, and the algorithm outputs “No” with high probability when the distribution is -far from any -wise uniform distribution (in total variation distance).
Property testing is well studied for Boolean functions and distributions. Previous work studied the testing of related properties of distribution, including uniformity [GR11, BFR00, RS09] and independence [BFF01, BKR04, ADK15, DK16].
The papers [AGM03, AAK07, Xie12] discussed the problem of testing -wise uniformity. [AGM03] constructed a -tester for -wise uniformity with sample complexity , and [AAK07] improved it to . As for lower bounds, [AAK07] showed that samples are necessary, albeit only for . This lower bound is in particular for distinguishing the uniform distribution from -far-from--wise distributions.
We show a better upper bound for sample complexity: There exists a -tester for -wise uniformity of distributions on with sample complexity . For the special case of , the sample complexity is .
A natural -tester of -wise uniformity is mentioned in [AAK07]
: Estimate all Fourier coefficients up to levelfrom the samples; if they are all smaller than then output “Yes”. In fact this algorithm is exactly attempting to check whether the distribution is -wise uniform. Hence the sample complexity depends on the upper bound for the Closeness Problem. Therefore we can reduce the sample complexity of this algorithm down to via our improved upper bound for the Closeness Problem. One factor remains because we need to union-bound over the Fourier coefficients up to level . To further get rid of the last factor, we present a new algorithm that estimates the Fourier weight up to level , , rather than estimating these Fourier coefficients one by one.
Unfortunately, a lower bound for the Closeness Problem does not imply a lower bound for the Testing Problem directly. In [AAK07], they showed that a uniform distribution over a random subset of of size , is almost surely -far from any -wise uniform distribution. On the other hand, by the Birthday Paradox, it is hard to distinguish between the fully uniform distribution on all strings of length and a uniform distribution over a random set of such size. This gives a lower bound for the Testing Problem as . Their result only holds for ; there was no previous non-trivial lower bound for testing pairwise uniformity. We show a lower bound for the pairwise case.
Any -tester for pairwise uniformity of distributions on needs at least samples.
For this lower bound we analyze a symmetric distribution with non-zero Fourier coefficients only on level 2. We prove that it is hard to distinguish a randomly shifted version of this distribution from the fully uniform distribution. This lower bound is also better than [AAK07] in that we have a better dependence on the parameter ( rather than ). Unfortunately we are unable to generalize our lower bound for higher .
Notice that for our new upper and lower bounds for -wise uniformity testing, there still remains a quadratic gap for , indicating that the upper bound might be able to be improved. Both the lower bound in our paper and that in [AAK07] show that it is hard to distinguish between the fully uniform distribution and some specific sets of distributions that are far from -wise uniform. We show that if one wants to improve the lower bound, one will need to use a distribution in the “Yes” case that is not fully uniform, because we give a sample-efficient algorithm for distinguishing between fully uniform and -far from -wise uniform:
For any constant , for testing whether a distribution is fully uniform or -far from every -wise uniform distribution, there exists an algorithm with sample complexity .
In fact, for testing whether a distribution is -wise uniform or -far from -wise uniform with , there exists an algorithm with sample complexity .
We remark that testing full uniformity can be treated as a special case of testing -wise uniformity approximately, by setting .
Testing full uniformity has been studied in [GR11, BFR00]. Paninski [Pan08] showed that testing whether an unknown distribution on is -close to fully uniform requires samples. Rubinfeld and Servedio [RS09] studied testing whether an unknown monotone distribution is fully uniform or not.
The fully uniform distribution has the nice property that every pair of samples is different in bits with high probability when the sample size is small. Our algorithm first rejects those distributions that disobey this property. We show that the remaining distributions have small Fourier weight up to level . Hence by following a similar analysis as the tester in Theorem 1.3, we can get an improved upper bound when these lower Fourier weights are small.
The lower bound remains the same as testing -wise vs. far from -wise. Our tester is tight up to a logarithmic factor for the pairwise case, and is tight up to a factor of when .
Section 2 contains definitions and notations. We will discuss upper and lower bounds for the Closeness Problem in Section 3. We will discuss the sample complexity of testing -wise uniformity in Section 4. We present a tester for distinguishing between -wise uniformity (or fully uniformity) and far-from -wise uniformity in Section 5.
2.1 Fourier analysis of Boolean functions
We use to denote the set . We denote the symmetric difference of two sets and by . For Fourier analysis we use notations consistent with [O’D14]. Every function has a unique representation as a multilinear polynomial
We call the Fourier coefficient of on . We use to denote that is uniformly distributed on . We can represent Fourier coefficients as
We define an inner product on pairs of functions by
We introduce the following -norm notation: , and the Fourier -norm is .
We say the degree of a Boolean function, is if its Fourier polynomial is degree . We denote , and . We denote the Fourier weight on level by . We denote .
We define the convolution of a pair of functions to be
where denotes entry-wise multiplication. The effect of convolution on Fourier coefficients is that .
2.2 Densities and distances
When working with probability distribution on , we prefer to define them via density function. A density function is a nonnegative function satisfying We write to denote that is a random variable drawn from the distribution , defined by
for all . We identify distributions with their density functions when there is no risk of confusion.
We denote . We denote by the density function for the uniform distribution on support set . The density function associated to the fully uniform distribution is the constant function .
The following lemma about density functions of degree at most derives from Fourier analysis and hypercontractivity.
Let be a density function of degree at most . Then
A distribution over is -wise uniform if and only if for all (see Chapter 6.1 in [O’D14]). We say that distribution over is -wise uniform if for all .
The most common way to measure the distance between two probability distributions is via their total variation distance. If the distributions have densities and , then the total variation distance is defined to be
We say that and are -close if .
Supposing is a set of distributions, we denote
In particular, we denote the set of -wise uniform densities by kWISE . We say that density is -close to -wise uniform if , and is -far otherwise.
2.3 Krawtchouk and Hermite polynomials
Krawtchouk polynomials were introduced in [Kra29], and arise in the analysis of Boolean functions as shown in [Lev95, Kal02]. Consider the following Boolean function of degree and input length : . It is symmetric and therefore only depends on the Hamming weight of . Let be the number of ’s in . Then the output of is exactly the same as the Krawtchouk polynomial . We denote by the Krawtchouk polynomial:
We will also use Hermite polynomials in our analysis. We denote by the normalized Hermite polynomial:
Its explicit formula is
One useful fact is that the derivative of a Hermite polynomial is a scalar multiple of a Hermite polynomial (see Exercise 11.10 in [O’D14]):
For any integer , we have
The relationship between Krawtchouk and Hermite polynomials is that we can treat Hermite polynomials as a limit version of Krawtchouk polynomials when goes to infinity (see Exercise 11.14 in [O’D14]).
For all and we have
Instead of analyzing Krawtchouk polynomials, it is easier to study Hermite polynomials when is large because Hermite polynomials have a more explicit form. We present some basic properties of Hermite polynomials with brief proofs.
The following are properties of :
for any ;
is positive and increasing when ;
for any constant .
We will treat the case of for some integer . The proof for the general case is similar. When , we can group adjacent terms into pairs:
Notice that is always between and when . Both the upper and lower bound have absolute value at most . Therefore by the triangle inequality we have .
It is easy to check that is positive when . Then by Fact 2.3, when .
This is trivial from the explicit formula since each term is exactly smaller than the previous term when . ∎
3 The Closeness Problem
In this section, we prove the upper bound in Theorem 1.2.2 and the lower bound in Theorem 1.2.2. One interesting fact is that we use duality of linear programming (LP) in both the upper and lower bound. We think this is the proper perspective for analyzing these questions.
3.1 Upper bound
The key idea for proving the upper bound is mixture distributions. Given an (, )-wise uniform density , we try to mix it with some other distribution using mixture weight , such that the mixture distribution is -wise uniform and is close to the original distribution. The following lemma shows that the distance between the original distribution and the mixture distribution is bounded by the weight .
If for some and density functions , then .
Therefore we only need to show the existence of an appropriate for some small . The constraints on can be written as an LP feasibility problem. Therefore by Farkas’ Lemma we only need to show that its dual is not feasible. The variables in the dual LP can be seen as a density function of degree at most .
Proof of Theorem 1.2.2 (general case).
Given density function , we try to find another density function with constraints
for all . Suppose such a density function exists. Then it is trivial that is also a density function and is -wise uniform. By Lemma 3.1, we conclude that .
The rest of proof is to show that such a exists when . We can write the existence as an LP feasibility problem with variables for and constraints:
where is a linear combination of variables .
The dual LP has variables for with constraints:
The original LP is feasible if and only if its dual LP is infeasible, by Farkas’ Lemma. This completes the proof, since when , for any density function with degree we have
where the second inequality holds by Cauchy–Schwarz, and the last inequality holds by Lemma 2.2 since has degree at most . ∎
For , further improvement can be achieved. We still try to use mixture distributions. Here we want to mix the distribution with indicator distributions on subsets of coordinates that have opposite biases to those of the original distribution.
Proof of Theorem 1.2.2 (case ).
By identifying each with if necessary, we may assume without loss of generality that for all . In addition, by reordering the coordinates, we may assume without loss of generality that . Define to be the density of the distribution over which is uniform on coordinates , and has constantly fixed to be for . It is easy to check for and for .
We define as
It is easy to check that is a density function and
Therefore is 1-wise uniform. Then by Lemma 3.1,
3.2 Lower bound
Interestingly, our proof of the lower bound also utilizes LP duality. We can write the Closeness Problem in the form of linear programming with variables for , as follows:
We ignore the factor of in the minimization for convenience in the following analysis.
The dual LP, which has variables for , is the following:
Thus given a pair of Boolean functions satisfying the constraints, the quantity is a lower bound for our Closeness Problem. Our distribution achieving the lower bound is a symmetric polynomial, homogeneous of degree (except that it has a constant term of , as is necessary for every density function). We can use Krawtchouk and Hermite polynomials to simplify the analysis.
Proof of Theorem 1.2.2.
where is a small parameter to be chosen later that will ensure and for all . We have .
Since , the objective function of the dual LP is
where the last inequality holds by Cauchy–Schwarz. It is easy to calculate the inner products , and
Assuming , we have .
Now we need to upper bound . Define satisfying . Then
By Fact 2.3, we know that when , for sufficient large ,
Now we set with some constant . It is easy to check that . Using the properties in Lemma 2.3, we get
Then using Cauchy–Schwarz again, we get