1 Introduction
Randomness extractors are central objects in the theory of computation. Loosely speaking, a
seeded extractor [NZ96] is a randomized algorithm that extracts nearly uniform bits from biased random sources, using a short seed of randomness. A nonmalleable extractor [DW09] is a seeded extractor that satisfies a very strong requirement regarding the lack of correlations of the output of the extractor with respect to different seeds.More accurately, a nonmalleable extractor is a function such that for every (weak) source of minentropy
and a random variable
uniformly distributed on it holds that is close to uniform, even given the seed and the value for any seed that is determined as an arbitrary function of . More generally, if is close to uniform, even given for adversarially chosen seeds such that for all , we say it is a nonmalleable extractor [CRS14].The notion of nonmalleable extractors is strongly motivated by applications to privacy amplification protocols, as well as proven to be a fundamental notion in the theory of pseudorandomness, as has been recently exemplified by the key role it played in the breakthrough construction of explicit twosource extractors by Chattopadhyay and Zuckerman [CZ16]. Moreover, it also has an important connection to Ramsey theory [BKS05].
Nonmalleable extractors can be thought of as a strengthening of the notion of strong seeded extractors. These are functions such that for a weak source and seed it holds that is close to uniform, even given the seed . We stress that this is a much weaker guarantee than that of nonmalleable extractors. In particular, there exist a blackbox transformation of seeded extractors into strong seeded extractors with roughly the same parameters [RSW06], whereas no such transformation is known for nonmalleable extractors.
By a simple probabilistic argument (see, e.g., [Vad12]), there exists a (strong) seeded extractor for sources of seed length and minentropy . Moreover, by a long line of research, starting with the seminal work of Nisan and Zuckerman [NZ96], and culminating with [GUV09, DKSS13, TU12] we now know of explicit constructions that nearly achieve the optimal parameters.
For nonmalleable extractors the parameters achievable by current constructions are weaker. Dodis and Wichs showed the existence of nonmalleable extractors with seed length , and entropy ; and in particular, for . The best explicit construction, due to [Coh17] achieve seed length for entropy .
Note that while for (strong) seeded extractors there are constructions that support sources of entropy , without any dependence on , all known constructions of nonmalleable extractors require the entropy of the source to be at least doublylogarithmic in . This naturally raises the question of whether the dependence on is indeed necessary for nonmalleable extractors.
Question: Is it true that in any nonmalleable extractor the entropy must grow with ?
In this paper we give a positive answer to this question, as well as reveal a simple yet fundamental connection between nonmalleable extractors and errorcorrecting codes, which we believe to be of independent interest.
1.1 Our results
Our main result is a lower bound on the entropy required by nonmalleable extractors, which essentially matches the one obtained by the probabilistic construction. In particular, we show that any nonmalleable extractor requires the source entropy to be at least . In fact, we prove the entropy lower bound for the more general notion of nonmalleable extractors.
Theorem 1 (Main result).
Let be parameters such that , and let for some absolute constant . If is a nonmalleable extractor, then and for an absolute constant .
We remark that by a recent result of BenAroya et al. [BCD17] (see Theorem 2.4), the lower bound on in the theorem is tight up to an additive factor of , and our lower bound on is almost tight in , up to an additive factor of . Furthermore, since as we mentioned above, there exist (strong) seeded extractors for sources of entropy , creftype 1 implies a chasm between nonmalleable extractors and (strong) seeded extractors; in particular, it rules out the possibility of transforming seeded extractors into nonmalleable extractors, while preserving the parameters.
A key technical tool that we use to prove creftype 1 is a lemma, which shows that any nonmalleable extractor induces an error correcting code with a good distance. We believe this lemma is of independent interest.
Lemma 2.
If there exists a nonmalleable extractor , then there exists an error correcting code with relative distance and rate .
In fact, we actually prove a more general lemma, which shows that nonmalleable extractors induce codes with rate that grows with . See Section 4 for details.
1.2 Technical overview
We provide a highlevel overview of the proof of our main result, the entropy lower bound in creftype 1, for the simple case of (i.e., for standard nonmalleable extractors). See Section 4 for the complete details of the proof for the general case. We assume basic familiarity with coding theory and extractors (see Section 2 for the necessary preliminaries).
Consider a nonmalleable extractor . Our strategy for showing a lower bound on the source entropy of consists of the following two steps.

Derive a binary code with high distance and rate from , as captured by creftype 2.

Show refined bounds on the rate of binary codes with a given minimum distance, and apply them to to obtain an entropy lower bound.
That is, we show that if the parameters of were too good, then the implied code would have parameters that would violate the rate bounds in the second step. Below, we elaborate on each of the steps.
Deriving codes from nonmalleable extractors.
We start with a nonmalleable extractor . Denote , and consider a (flat) source , which we view as a collection of vectors . We show that there is a large subset of the seeds such that the evaluations of , with respect to and , constitute a code with high distance and rate.
More accurately, denote by the evaluation vector of on the source and seed ; that is, . We show that there exists a large subset of seeds such that
is a code with distance and rate .
As a warmup, it is instructive to note that the definition of (standard) seeded extractors only requires that a random coordinate of a random is nearly uniformly distributed. Strong seeded extractors also imply that most evaluation vectors are roughly balanced (i.e., contain a similar number of zeros and ones),^{1}^{1}1 We stress that elements of a set of nearlybalanced vectors are not necessarily pairwisefar, unless this set is a linear space. Hence, the foregoing property of strong seeded extractors does not imply a good code in general. as a strong seeded extractor needs to output a nearly uniform bit, even given the seed (i.e., even when the identity of is known).
The key observation is that the structure of nonmalleable extractors asserts that there exists a large subset of seeds whose corresponding evaluation vectors are (close to) pairwise uncorrelated, and hence constitute a code with large distance. Details follow.
Denote the number of seeds by . We wish to show that there exists a subset of seeds whose corresponding evaluation vectors are pairwise far. Suppose the contrary, i.e., that every set of seeds contains at least two distinct seeds such that is close to . This means that we can iteratively select a set of “bad” seeds such that and are close in Hamming distance, for every . (See Fig. 1.)
The crux is that having many pairs of correlated evaluation vectors violates the assumption that is a nonmalleable extractor. Intuitively, this holds because for each corresponding to a bad seed , the output of is biased given . Hence, a nonmalleable extractor cannot have a large set of bad seeds.
In Section 4.1 we make this intuition precise by exhibiting an adversarial function (with no fixed points) that matches pairs of bad seeds such that we can construct a distinguisher that, for a random variable uniformly distributed on the seeds , can tell apart with confidence between and a uniform bit, even when given and .
Refined rate bounds for binary codes.
After we derived a binary code with distance and rate from a nonmalleable extractor , we wish to apply upper bounds on the rate of binary codes, which will in turn imply entropy lower bounds on the entropy that requires.
Our starting point is the stateoftheart upper bound of McEliece, Rodemich, Rumsey and Welch [MRR77], which, loosely speaking, states that any binary code with relative distance has rate for all sufficiently small .
Alas, the aforementioned bound does not suffice for the entropy lower bound, as we need a quantitative bound in terms of the blocklength of the code. We, thus, prove the following theorem, which provides the refined bound that we need.
Theorem 3.
Fix a constant , and let . For let be a code with relative distance . Then .
We prove creftype 3 in Section 3, relying on the spectral approach of Navon and Samorodnitsky [NS09].
To conclude the proof of the entropy lower bound, we argue that if the nonmalleable extractor could support entropy that is smaller than stated in creftype 1, then the code we derive via creftype 2 would have rate that would violate the lower bound in creftype 3.
1.3 Organization
In Section 2 we present the required preliminaries. In Section 3 we prove the refined bounds on the rate of binary codes. Finally, in Section 4 we prove our main result, creftype 1, as well as creftype 2, which captures the connection between nonmalleable extractors and error correcting codes.
2 Preliminaries
We cover the notation and basic definitions used in this paper.
2.1 Notation
For , we denote by the set , and by the random variable that is uniformly distributed over . Throughout, is defined as . The binary entropy function is given by . We denote by the indicator of an event . For a finite set , we denote by
the probability over an element
that is chosen uniformly at random from .Distance.
The relative Hamming distance (or just distance), over alphabet , between two vectors is denoted . If , we say that is close to , and otherwise we say that is far from . Similarly, the relative distance of from a nonempty set is denoted . If , we say that is close to , and otherwise we say that is far from .
The total variation distance between two random variables over domain is denoted by , and is equivalent, up to a factor , to their distance . We say that is close to if , and otherwise we say that is far from .
Remark. In order to show that is far from it suffices to show a randomized distinguisher such that , where the probabilities are over the random variables and the randomness of . Note that if such randomized distinguisher exists, then, by averaging, there is also a deterministic distinguisher with the same property. This, naturally, defines the event . for which we have , and hence is far from .
2.2 Error correcting codes
Let , and let be a finite alphabet. An error correcting code is a set , and the elements of are called its codewords. The parameter is called the blocklength of , and is the dimension of . The relative distance of a code is the minimal relative Hamming distance between its codewords, and is denoted by . The rate of the code, measuring the redundancy of the encoding, is the ratio of its dimension and blocklength, and is denote by . If the alphabet is binary, i.e., , we say that is a binary code.
2.3 Randomness extractors
We recall the standard definitions of random sources and several types of extractors, as well as state known bounds that we will need.
Weak sources.
For integers , an random source of minentropy is a random variable taking values in such that for every is holds that . An random source is flat if it is uniformly distributed over some subset of size .
It is well known [CG88] that the distribution of any random source is a convex combination of distributions of flat random sources, and thus it typically suffices to consider flat sources. We follow the literature, restrict our attention to flat random sources, and refer to them simply as sources.
Seeded extractors.
A function is a seeded extractor if for any source , the distribution of is close to , i.e., . (Recall that denotes the random variable that is uniformly distributed on .)
A function is a strong seeded extractor if for any source the distribution of is close to . We will need the following lower bound on the source entropy required by strong seeded extractors, due to Radhakrishnan and TaShma [RT00] (see also [NZ96]).
Theorem 2.1 ([Rt00] Theorem 1.9).
Let be a strong seeded extractor. Then, it holds that
and ,
for some absolute constant .
Nonmalleable extractors.
Informally, a nonmalleable extractor is a seeded extractor that for any source and seed outputs a bit that is nearly uniform even if given the seed and value for an adversarially selected seed .
Formally, we say that a function is an adversarial function if it has no fixed points, i.e., if for all . Nonmalleable extractors are defined as follows.
Definition 2.2.
A function is a nonmalleable extractor if for any source , and for any adversarial function , it holds that the distribution of the 3tuple is close to ; that is,
We will also consider the more general notion of nonmalleable extractors, in which it is possible to extract randomness even given multiple (namely, ) outputs of the extractor with respect to adversarially chosen seeds.
Definition 2.3.
A function is a nonmalleable extractor if for any source and for any adversarial functions it holds that
We conclude this section by stating a recent result, due to BenAroya et al. [BCD17], extending a result by Dodis and Wichs [DW09], which complements our creftype 1 by showing that the lower bound on the seed length in the creftype 1 is tight up to an additive factor of , and the lower bound on is almost tight in , up to an additive factor of .
3 Refined coding bounds
As we mentioned in the technical overview (Section 1.2), we prove our entropy lower bound for nonmalleable extractors by deriving codes from extractors and bounding the rate of these codes. To this end, in this section we prove refined bounds on the rate of binary codes with a given minimum distance. Our starting point is the seminal result of McEliece, Rodemich, Rumsey and Welch [MRR77].
Theorem 3.1 ([Mrr77]).
Any code with relative distance has rate at most , where is some function that tends to zero as grows to infinity.
Observe that in particular, by plugging in for sufficiently small , and letting be sufficiently large Theorem 3.1 implies that any family of binary codes with blocklength and relative distance has rate .
However, the above does not suffice for our needs, as to prove our main result (creftypecap 1) we need a quantitative bound on . We thus prove the following theorem, which provides the refined bound that we seek.
Theorem 3.2.
Fix some constant , and let . For , let be a code with relative distance . Then .
Proof.
The proof follows the general approach of Navon and Samorodnitsky [NS09], who provide a spectral graph theoretic framework to prove upper bounds on the rate of binary codes.
We will need the following definition, which generalizes the notion of a
maximal eigenvalue
to subsets of the hypercube.Definition 3.3.
Let be the adjacency matrix of the hypercube graph; that is, if and only if and differ in exactly one coordinate. Given a set , we define
To better understand the definition of , it is convenient to consider the subgraph of the hypercube graph induced by the vertices in , and observe that is the maximal eigenvalue of the adjacency matrix of . Navon and Samorodnitsky [NS09] prove the following result.
Proposition 3.4 ([Ns09, Proposition 1.1 ]).
Let be a code with relative distance , and let . Suppose that for a subset it holds that . Then .
The foregoing theorem naturally suggest the following proof strategy: to upper bound the rate of a binary code with relative distance , it suffcies to exhibit a (small as possible) set whose corresponding maximal eigenvalue satisfies ; note that the smaller is, the better upper bound we get on the rate of .
Towards this end, let be a parameter to be chosen later, and let
We lower bound the maximal eigenvalue by showing a particular function that is supported on , such that . Specifically, for some to be chosen later, we define as
Clearly . Observe that
By choosing to be an integer in the interval and letting we get that^{2}^{2}2Note that by the assumption in the theorem we have . In particular, the interval contains an integer.
where the last inequality uses the assumptions that , which implies that . Therefore, by applying Proposition 3.4 we get that
which concludes the proof of Theorem 3.2. ∎
4 Proof of creftypecap 1
In this section we prove creftypecap 1, which we restate here with slightly more specific parameters than those stated above.
creftypecap 1 (restated):
Let be parameters such that , and let for , where is the constant from Theorem 2.1. If is a nonmalleable extractor, then
and .
We start, in Section 4.1, with the proof of creftypecap 1 for the special case where (i.e., for standard nonmalleable extractors). Then, in Section 4.2, we provide the full proof for general values of .
4.1 Proof of creftype 1 for
Following the outline provided in Section 1.2, we start the proof with the following lemma, showing that any nonmalleable extractor induces an error correcting code with good distance.
Lemma 4.1 (creftype 2, restated).
If there exists a nonmalleable extractor , then there exists an error correcting code with relative distance and rate .
Proof.
Let be a nonmalleable extractor, and let be an source. That is, is a collection of vectors, which we denote by . For each seed , let be the bit evaluation vector defined as
We claim that the (multi)set contains an error correcting code with relative distance and rate .
Claim 4.2.
There exists a subset of size such that for every two distinct it holds that .
Proof.
Suppose towards contradiction that for every subset of size at least there exist distinct seeds such that . We show below that this contradicts the assumption that is a nonmalleable extractor.
Indeed, by the assumption, we can find such that . Then, we can remove from , and apply the assumption again, to obtain such that . By iteratively repeating this argument times, where , we obtain pairs of distinct elements such that
(1) 
Let denote the set of all such “bad” seeds, and define an adversarial function that matches each pair of bad seeds by mapping and for all , and defining arbitrarily for all other seeds .
Next we prove that is not a nonmalleable extractor by arguing that the distribution of the random variable consisting of the tuple is far from , where recall that denotes the random variable that is uniformly distributed over . Indeed, consider the following distinguisher , defined as
Clearly . On the other hand, by Eq. 1, for sampled from we have
thus contradicting the assumption that is a nonmalleable extractor. This concludes the proof of creftype 4.2. ∎
Therefore, by creftype 4.2 there exists a set of size such that for every it holds that , i.e., is an error correcting code with relative distance and rate , which completes the proof of Lemma 4.1. ∎
By applying the bound from Theorem 3.2 to the code obtained in Lemma 4.1, we prove creftype 1 for the case of .
Proof of creftype 1 for .
Since every nonmalleable extractor is, in particular, a strong seeded extractor, then by Theorem 2.1 it holds that the seed length is , as required. Furthermore, Theorem 2.1 also implies that
(2) 
By Lemma 4.1, if is a nonmalleable extractor, then there exists an error correcting code with relative distance and rate .
Next, we wish to apply Theorem 3.2 to the code . Recall that by the assumption it holds that and , and observe that by Eq. 2 we have . Therefore, by applying Theorem 3.2, with respect to (recall that ) and we get that
and thus , as required. ∎
4.2 Proof of creftype 1 for general
Next, we extend the idea presented in Section 4.1 to larger values of . The key step is the following lemma.
Lemma 4.3.
If there exists a nonmalleable extractor , then, there exists an error correcting code with relative distance such that .
Proof.
Let be a nonmalleable extractor. Similarly to the proof of Lemma 4.1, we set , and let be an source, which we view as a collection of vectors . For each seed , let be the bit evaluation vector, defined as . Hereafter, all sums involving binary vectors are summations over . For , we denote by the (absolute) Hamming weight of .
Whereas before, in the proof of Lemma 4.1, we showed that the multiset of evaluation vectors simply contains an error correcting code with good parameters, here we will derive our code by considering all linear combinations of elements of a carefully selected subset of the evaluation vectors.
Towards that end, the next claim shows that there exists a large subset of seeds such that any linear combination of of the evaluation vectors that corresponds to these seeds has large Hamming weight.
Claim 4.4.
There is a subset of size such that for every subset of size it holds that .
Proof.
Assume towards contradiction that for every subset of size at least there are distinct seeds such that
We show below that this contradicts the assumption that is a nonmalleable extractor.
By our assumption, there is a subset of seeds for which there exists of size such that . We remove from , and apply the assumption again to obtain of size such that
Comments
There are no comments yet.