Unifying computational entropies via Kullback-Leibler divergence

02/28/2019 ∙ by Rohit Agrawal, et al. ∙ Harvard University 0

We introduce KL-hardness, a new notion of hardness for search problems which on the one hand is satisfied by all one-way functions and on the other hand implies both next-block pseudoentropy and inaccessible-entropy, two forms of computational entropy used in recent constructions of pseudorandom generators and statistically hiding commitment schemes, respectively. Thus, KL-hardness unifies the latter two notions of computational entropy and sheds light on the apparent "duality" between them. Additionally, it yields a more modular and illuminating proof that one-way functions imply next-block inaccessible entropy, similar in structure to the proof that one-way functions imply next-block pseudoentropy (Vadhan and Zheng, STOC '12).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 One-way functions and computational entropy

One-way functions [DH76] are on one hand the minimal assumption for complexity-based cryptography [IL89], but on the other hand can be used to construct a remarkable array of cryptographic primitives, including such powerful objects as CCA-secure symmetric encryption, zero-knowledge proofs and statistical zero-knowledge arguments for all of , and secure multiparty computation with an honest majority [GGM86, GMW91, GMW87, HILL99, Rom90, Nao91, HNO09]. All of these constructions begin by converting the “raw hardness” of a one-way function (OWF) to one of the following more structured cryptographic primitives: a pseudorandom generator (PRG) [BM82, Yao82], a universal one-way hash function (UOWHF) [NY89], or a statistically hiding commitment scheme (SHC) [BCC88].

The original constructions of these three primitives from arbitrary one-way functions [HILL99, Rom90, HNO09] were all very complicated and inefficient. Over the past decade, there has been a series of simplifications and efficiency improvements to these constructions [HRVW09, HRV13, HHR10, VZ12], leading to a situation where the constructions of two of these primitives — PRGs and SHCs — share a very similar structure and seem “dual” to each other. Specifically, these constructions proceed as follows:

  1. Show that every OWF has a gap between its “real entropy” and an appropriate form of “computational entropy”. Specifically, for constructing PRGs, it is shown that the function has “next-block pseudoentropy” at least while its real entropy is  [VZ12] where denotes Shannon entropy. For constructing SHCs, it is shown that the function has “next-block accessible entropy” at most while its real entropy is again  [HRVW09]. Note that the differences between the two cases are whether we break or into individual bits (which matters because the “next-block” notions of computational entropy depend on the block structure) and whether the form of computational entropy is larger or smaller than the real entropy.

  2. An “entropy equalization” step that converts into a similar generator where the real entropy in each block conditioned on the prefix before it is known. This step is exactly the same in both constructions.

  3. A “flattening” step that converts the (real and computational) Shannon entropy guarantees of the generator into ones on (smoothed) min-entropy and max-entropy. This step is again exactly the same in both constructions.

  4. A “hashing” step where high (real or computational) min-entropy is converted to uniform (pseudo)randomness and low (real or computational) max-entropy is converted to a small-support or disjointness property. For PRGs, this step only requires randomness extractors [HILL99, NZ96], while for SHCs it requires (information-theoretic) interactive hashing [NOVY98, DHRS04]. (Constructing full-fledged SHCs in this step also utilizes UOWHFs, which can be constructed from one-way functions [Rom90]. Without UOWFHs, we obtain a weaker binding property, which nevertheless suffices for constructing statistical zero-knowledge arguments for all of .)

This common construction template came about through a back-and-forth exchange of ideas between the two lines of work. Indeed, the uses of computational entropy notions, flattening, and hashing originate with PRGs [HILL99], whereas the ideas of using next-block notions, obtaining them from breaking into short blocks, and entropy equalization originate with SHCs [HRVW09]. All this leads to a feeling that the two constructions, and their underlying computational entropy notions, are “dual” to each other and should be connected at a formal level.

In this paper, we make progress on this project of unifying the notions of computational entropy, by introducing a new computational entropy notion that yields both next-block pseudoentropy and next-block accessible entropy in a clean and modular fashion. It is inspired by the proof of [VZ12] that has next-block pseudoentropy , which we will describe now.

1.2 Next-block pseudoentropy via KL-hardness

We recall the definition of next-block pseudoentropy, and the result of [VZ12] relating it to one-wayness.

[next-block pseudoentropy, informal] Let be a security parameter, and

be a random variable distributed on strings of length

. We say that has next-block pseudoentropy at least if there is a random variable

, jointly distributed with

, such that:

  1. For all , is computationally indistinguishable from .

  2. .

Equivalently, for uniformly distributed in , has conditional pseudoentropy at least given .

It was conjectured in [HRV10] that next-block pseudoentropy could be obtained from any OWF by breaking its input into bits, and this conjecture was proven in [VZ12]:

[[VZ12], informal] Let be a one-way function, let be uniformly distributed in , and let be a partition of into blocks of length . Then has next-block pseudoentropy at least .

The intuition behind Theorem 1.2 is that since is hard to sample given , then it should have some extra computational entropy given . This intuition is formalized using the following notion of being “hard to sample”:

[KL-hard for Sampling] Let be a security parameter, and be a pair of random variables, jointly distributed over strings of length . We say that is -KL-hard for sampling given if for all probabilistic polynomial-time , we have

where denotes Kullback–Leibler divergence (a.k.a. relative entropy).111Recall that for random variables and with , the Kullback–Leibler divergence is defined by . That is, it is hard for any efficient adversary to sample the conditional distribution of given , even approximately.

The first step of the proof of Theorem 1.2 is to show that one-wayness implies KL-hardness of sampling (which can be done with a one-line calculation): Let be a one-way function and let be uniformly distributed in . Then is -KL-hard to sample given .

Next, we break into short blocks, and show that the KL-hardness is preserved: Let be a security parameter, let be random variables distributed on strings of length , let be a partition of into blocks, and let be uniformly distributed in . If is -KL-hard to sample given , then is -KL-hard to sample given .

Finally, the main part of the proof is to show that, once we have short blocks, KL-hardness of sampling is equivalent to a gap between conditional pseudoentropy and real conditional entropy. Let be a security parameter, be a random variable distributed on strings of length , and a random variable distributed on strings of length . Then is -KL-hard to sample given iff has conditional pseudoentropy at least given .

Putting these three lemmas together, we see that when is a one-way function, and we break into blocks of length to obtain , on average, the conditional pseudoentropy of given is larger than its real conditional entropy by . This tells us that the next-block pseudoentropy of is larger than its real entropy by , as claimed in Theorem 1.2.

We remark that Lemma 1.2 explains why we need to break the input of the one-way function into short blocks: it is false when is long. Indeed, if is a one-way function, then we have already seen that is -KL-hard to sample given (Lemma 1.2), but it does not have conditional pseudoentropy noticeably larger than given (as correct preimages can be efficiently distinguished from incorrect ones using ).

1.3 Inaccessible entropy

As mentioned above, for constructing SHCs from one-way functions, the notion of next-block pseudoentropy is replaced with next-block accessible entropy:

[next-block inaccessible entropy, informal] Let be a security parameter, and be a random variable distributed on strings of length . We say that has next-block accessible entropy at most if the following holds.

Let be any probabilistic -time algorithm that takes a sequence of uniformly random strings and outputs a sequence in an “online fashion” by which we mean that depends on only the first random strings of for . Suppose further that .

Then we require:

(Next-block) accessible entropy differs from (next-block) pseudoentropy in two ways:

  1. Accessible entropy is useful as an upper bound on computational entropy, and is interesting when it is smaller than the real entropy . We refer to the gap as the inaccessible entropy of .

  2. The accessible entropy adversary is trying to generate the random variables conditioned on the history rather than recognize them. Note that we take the “history” to not only be the previous blocks , but the coin tosses used to generate those blocks.

Note that one unsatisfactory aspect of the definition is that when the random variable is not flat (i.e. uniform on its support), then there can be an adversary achieving accessible entropy even larger than , for example by making uniform on .

Similarly to (and predating) Theorem 1.2, it is known that one-wayness implies next-block inaccessible entropy.

[[HRVW09]] Let be a one-way function, let be uniformly distributed in , and let be a partition of into blocks of length . Then has next-block accessible entropy at most .

Unfortunately, however, the existing proof of Theorem 1.3 is not modular like that of Theorem 1.2. In particular, it does not isolate the step of relating one-wayness to entropy-theoretic measures (like Lemma 1.2 does) or the significance of having short blocks (like Lemma 1.2 does).

1.4 Our results

We remedy the above state of affairs by providing a new, more general notion of KL-hardness that allows us to obtain next-block inaccessible entropy in a modular way while also encompassing what is needed for next-block pseudoentropy.

Like in KL-hardness for sampling, we will consider a pair of jointly distributed random variables . Following the spirit of accessible entropy, the adversary for our new notion will try to generate together with , rather than taking as input. That is, will take randomness and output a pair , which we require to be always within the support of . Note that need not be an online generator; it can generate both and using the same randomness . Of course, if is efficiently samplable (as it would be in most cryptographic applications), could generate identically distributed to by just using the “honest” sampler for . So, in addition, we require that the adversary also come with a simulator , that can simulate its coin tosses given only . The goal of the adversary is to minimize the KL divergence

for a uniformly random . This divergence measures both how well approximates the distribution of as well as how well simulates the corresponding coin tosses of . Note that when is the honest sampler , the task of is exactly to sample from the conditional distribution of given . However, the adversary may reduce the divergence by instead designing the sampler and simulator to work in concert, potentially trading off how well approximates in exchange for easier simulation by . Explicitly, the definition is as follows. [KL-hard, informal version of Definition 3.2] Let be a security parameter, and be a pair of random variables jointly distributed over strings of length . We say that is -KL-hard if the following holds.

Let and be probabilistic -time algorithms such that , where is uniformly distributed. Then writing , we require that

Similarly to Lemma 1.2, we can show that one-way functions achieve this notion of KL hardness. Let be a one-way function and let be uniformly distributed in . Then is -KL-hard. Note that this lemma implies Lemma 1.2. If we take to be the “honest” sampler , then we have:

which is is by Lemma 1.4. That is, -hardness for sampling preimages (as in Definition 1.2 and Lemma 1.2) is obtained by fixing and focusing on the hardness for the simulator , i.e. the divergence .

Conversely, we show that inaccessible entropy comes by removing the simulator from the definition, and focusing on the hardness for the generator . It turns out that this removal is possible when we break into short blocks, yielding the following definition.

[next-block-KL-hard, informal] Let be a security parameter, and be a random variable distributed on strings of length . We say that is -next-block-KL-hard if the following holds.

Let be any probabilistic -time algorithm that takes a sequence of uniformly random strings and outputs a sequence in an “online fashion” by which we mean that depends on only the first random strings of for . Suppose further that .

We require that for all such , we have:

where we use the notation and is a dummy random variable independent of . That is, the goal of the online generator is to generate given the history of coin tosses with the same conditional distribution as given . Notice that there is no explicit simulator in the definition of next-block-KL-hardness. Nevertheless we can obtain it from KL-hardness by using sufficiently short blocks:

Let be a security parameter, let be random variables distributed on strings of length , and let be a partition of into blocks of length .

If is -KL-hard, then is -next-block-KL-hard. An intuition for the proof is that since the blocks are of logarithmic length, given we can simulate the corresponding coin tosses of of

by rejection sampling and succeed with high probability in

tries.

A nice feature of the definition of next-block-KL-hardness compared to inaccessible entropy is that it is meaningful even for non-flat random variables, as KL divergence is always nonnegative. Moreover, for flat random variables, it equals the inaccessible entropy:

Suppose is a flat random variable. Then is -next-block-KL-hard if and only if has accessible entropy at most Intuitively, this lemma comes from the identity that if is a flat random variable and , then . We stress that we do not require the individual blocks have flat distributions, only that the random variable as a whole is flat. For example, if is a function and is uniform, then is flat even though itself may be far from flat.

Putting together Lemmas 1.4, 1.4, and 1.4, we obtain a new, more modular proof of Theorem 1.3. The reduction implicit in the combination of these lemmas is the same as the one in [HRVW09], but the analysis is different. (In particular, [HRVW09] makes no use of KL divergence.) Like the existing proof of Theorem 1.2, this proof separates the move from one-wayness to a form of KL-hardness, the role of short blocks, and the move from KL-hardness to computational entropy. Moreover, this further illumination of and toolkit for notions of computational entropy may open the door to other applications in cryptography.

We remark that another interesting direction for future work is to find a construction of universal one-way hash functions (UOWHFs) from one-way functions that follows a similar template to the above constructions of PRGs and SHCs. There is now a construction of UOWHFs based on a variant of inaccessible entropy [HHR10], but it remains more complex and inefficient than those of PRGs and SHCs.

2 Preliminaries

Notations.

For a tuple , we write for , and for .

denotes the set of polynomial functions and the set of all negligible functions: if for all and large enough , . We will sometimes abuse notations and write to mean for some and similarly for .

ppt stands for probabilistic polynomial time and can be either in the uniform or non-uniform model of computation. All our results are stated as uniform polynomial time oracle reductions and are thus meaningful in both models.

For a random variable over , denotes the support of . A random variable is flat if it is uniform over its support. Random variables will be written with uppercase letters and the associated lowercase letter represents a generic element from its support.

Information theory.

[Entropy] For a random variable and , the sample entropy (also called surprise) of is . The entropy of is the expected sample entropy: .

[Conditional entropy] Let be a pair of random variables and consider , the conditional sample entropy of is and the conditional entropy of given is the expected conditional sample entropy:

[Chain rule for entropy] Let

be a pair of random variables, then and for , .

[KL-divergence] For a pair of random variables and the sample -divergence (-probability ratio) is:

and the -divergence between and is the expected sample -divergence:

[Conditional -divergence] For pairs of random variables and , and , the conditional sample -divergence is:

and the conditional -divergence is:

[Chain rule for -divergence] For pairs of random variables and :

and for :

[Data-processing inequality] Let be a pair of random variables and let be a function defined on , then:

[Max--divergence] Let be a pair of random variables and . We define

to be the quantile of level

of , equivalently it is the smallest satisfying:

and it is characterized by the following equivalence:

Block generators

[Block generator] An -block generator is a function . denotes the -th block of on input and denotes the bit length of the -th block.

[Online generator] An online -block generator is a function such that for all and , only depends on . We sometimes write when the input blocks are unspecified.

[Support] The support of a generator is the support of the random variable for uniform input . If is an -block generator, and is a binary relation, we say that is supported on if .

When is an -block generator supported on a binary relation , we will often use the notation to emphasize that the last block corresponds to a witness for the first blocks.

Cryptography.

[One-way Function] Let be a security parameter, and . A function is a -one-way function if:

  1. For all time randomized algorithm : , where is uniform over .

  2. There exists a ppt algorithm such that for all .

If is -one-way for every , we say that is (strongly) one-way.

3 Search Problems and KL-hardness

In this section, we first present the classical notion of hard-on-average search problems and introduce the new notion of KL-hardness. We then relate the two notions by proving that average-case hardness implies KL-hardness.

3.1 Search problems

For a binary relation , we write for the predicate that is true iff and say that is a witness for the instance 222We used the unconventional notation for the instance (instead of ) because our relations will often be of the form for some function ; in this case an instance is some in the range of and a witness for is any preimage .. To each relation , we naturally associate (1) a search problem: given , find such that or state that no such exist and (2) the decision problem defined by the language . denotes the set of all relations computable by a polynomial time algorithm and such that there exists a polynomial such that . Whenever , the associated decision problem is in . We now define average-case hardness.

[Distributional search problem] A distributional search problem is a pair where is a binary relation and is a random variable supported on .

The problem is (t, )-hard if for all time randomized algorithm , where the probability is over the distribution of and the randomness of .

For , the problem of inverting is the search problem associated with the relation . If is a -one-way function, then the distributional search problem of inverting on a uniform random input is -hard.

Consider a distributional search problem . Without loss of generality, there exists a (possibly inefficient) two-block generator supported on such that for uniform input . If is polynomial-time computable, it is easy to see that the search problem is at least as hard as . The advantage of writing the problem in this “functional” form is that the distribution over (instance, witness) pairs is flat, which is a necessary condition to relate hardness to inaccessible entropy (see Theorem 4.2).

Furthermore, if is also polynomial-time computable and is -hard, then is a one-way function. Combined with the previous example, we see that the existence of one-way functions is equivalent to the existence of -hard search problems for which (instance, witness) pairs can be efficiently sampled.

3.2 KL-hardness

Instead of considering an adversary directly attempting to solve a search problem , the adversary in the definition of -hardness comprises a pair of algorithm where is a two-block generator outputting valid (instance, witness) pairs for and is a simulator for : given an instance , the goal of is to output randomness for such that . Formally, the definition is as follows.

[KL-hard] Let be a distributional search problem. We say that is --hard if:

for all pairs of time algorithms where is a two-block generator supported on and is uniform randomness for . Similarly, is --hard if for all such pairs:

Note that a pair achieves a -divergence of zero in Definition 3.2 if has the same distribution as and if for all . In this case, writing , we have that is a valid witness for since is supported on .

More generally, the composition solves the search problem whenever . When the -divergences in Definition 3.2 are upper-bounded, we can lower bound the probability of the search problem being solved (Lemma 3.2) This immediately implies that hard search problems are also -hard.

Let be a distributional search problem. If is -hard, then it is --hard and --hard for every where ,333For the theorems in this paper that relate two notions of hardness, the notation means that there exists a constant depending only on the computational model such that . and .

As we see, a “good” simulator for a generator is one for which holds often. It will be useful in Section 4 to consider simulators which are allowed to fail by outputting a failure string , (e.g. ) and adopt the convention that whenever . With this convention, we can without loss of generality add the requirement that whenever : indeed, can always check that it is the case and if not output a failure symbol. For such a simulator , observe that for all , the second variable on both sides of the -divergences in Definition 3.2 is obtained by applying on the first variable and can thus be dropped, leading to a simpler definition of -hardness: .

Theorem 3.2 is an immediate consequence of the following lemma.

Let be a distributional search problem and ) be a pair of algorithms with a two-block generator supported on . Define the linear-time oracle algorithm . For and :

  1. If then .

  2. If then .

Proof.

We have:

( is supported on )

Now, the first claim follows by Jensen’s inequality (since is convex) and the second claim follows by Markov’ inequality when considering the event that the sample-KL is smaller than (which occurs with probability at least by assumption). ∎

Relation to KL-hardness for sampling.

In [VZ12], the authors introduced the notion of -hardness for sampling: for jointly distributed variables , is hard for sampling given if it is hard for a polynomial time adversary to approximate—measured in KL-divergence—the conditional distribution given . Formally:

[KL-hard for sampling, Def. 3.4 in [VZ12]] Let be a pair of random variables, we say that is --hard to sample given if for all time randomized algorithm , we have:

As discussed in Section 1.2, it was shown in [VZ12] that if is a one-way function, then has next-bit pseudoentropy for uniform (see Theorem 1.2). The first step in proving this result was to prove that is -hard to sample given (see Lemma 1.2).

We observe that when is of the form for some function and variable over , then -hardness for sampling is implied by -hardness by simply fixing to be the “honest sampler” . Indeed, in this case we have:

We can thus recover Lemma 1.2 as a direct corollary of Theorem 3.2.

Consider a function and define and for uniform over . If is -one-way, then is --hard and is --hard to sample given with .

Witness -hardness.

We also introduce a relaxed notion of -hardness called witness--hardness. In this notion, we further require to approximate the joint distribution of (instance, witness) pairs rather than only instances. For example, the problem of inverting a function  over a random input is naturally associated with the distribution . The relaxation in this case is analogous to the notion of distributional one-way function for which the adversary is required to approximate the uniform distribution over preimages.

[Witness KL-hardness] Let be a binary relation and be a pair of random variables supported on . We say that is -witness-KL-hard if for all pairs of time algorithms where is a two-block generator supported on , for uniform :

Similarly, for , is -witness--hard, if for all such pairs:

We introduced -hardness first, since it is the notion which is most directly obtained from the hardness of distribution search problems. Observe that by the data processing inequality for KL divergence (Proposition 2), dropping the third variable on both sides of the KL divergences in Definition 3.2 only decreases the divergences. Hence, -hardness implies witness--hardness as stated in (Theorem 3.2). As we will see in Section 4 witness--hardness is the “correct” notion to obtain inaccessible entropy from: it is in fact equal to inaccessible entropy up to losses.

Let be a binary relation and be a pair of random variables supported on . If is -hard, then is -witness--hard and -witness--hard for every where , and .

The data processing inequality does not hold exactly for , hence the statement in Theorem 3.2 does not follow with the claimed parameters in a black-box manner from Theorem 3.2. However, an essentially identical proof given in Appendix A yields the result.

4 Inaccessible Entropy and Witness KL-hardness

In this section, we relate our notion of witness KL-hardness to the inaccessible entropy definition of [HRVW16]. Roughly speaking, we “split” the KL-hard definition into blocks to obtain an intermediate blockwise KL-hardness property (Section 4.1) that we then relate to inaccessible entropy (Section 4.2). Together, these results show that if is a one-way function, the generator has superlogarithmic inaccessible entropy.

4.1 Next-block KL-hardness and rejection sampling

Consider a binary relation and a pair of random variables supported on . For an online -block generator supported on , it is natural to consider the simulator that exploits the block structure of : on input , generates randomness block by block using rejection sampling until . The subscript is the maximum number of attempts after which gives up and outputs . The formal definition of is given in Algorithm 1.

for  to  do
     repeat
         sample
     until  or attempts
     if  then
          for ; return
     end if
end for
Algorithm 1 Rejection sampling simulator

As will be established in Theorem 4.1, the “approximation error” of the pair