I Introduction
A fundamental paradigm in the design and analysis of symmetric encryption schemes is the following twostep process: (1) Design a symmetric encryption scheme assuming the availability of a uniformlyrandom permutation; (2) Analyze the security of the scheme assuming that the permutation is switched to a uniformlyrandom function.
Step (1) relies on the widelybelieved existence of pseudorandom permutations (see, for example, [1, 2]), which are efficientlycomputable and efficientlyinvertible keyed permutations over that are computationally indistinguishable from a uniformlyrandom permutation in a standard cryptographic sense, where is the set of all possible keys . Pseudorandom permutations are realized via a variety of known practical constructions, such as the wellstudied and standardized Advanced Encryption Standard for which .
Step (2) relies on the fact that a uniformlyrandom function can serve as a perfectlysecure onetime pad for the encryption of an exponentiallylarge number of messages. For example, assuming that two parties secretly share a uniformlyrandom permutation
over (this would correspond to actually sharing a key for a pseudorandom permutation), they can use the widelydeployed counter mode for the encryption of multiple messages, and encrypt their th message as the pair . Modifying the scheme by replacing its random permutation with a random function enables to argue that an attacker observing a sequence of ciphertexts obtains no information on their corresponding messages . Note, however, that these ciphertexts result from the modified scheme that uses the function , and not from the original one that uses the permutation . Thus, it must be argued that the security of the modified scheme provides a meaningful guarantee for the security of the original one.The switching lemma. The security of the modified scheme and that of the original scheme are tied together via a simple argument, commonly referred to as the “switching lemma”. This lemma captures the advantage of distinguishing between a random permutation and a random function. For an algorithm (an attacker) that observes ciphertexts, this translates to upper bounding its advantage in distinguishing a sequence of values that are sampled uniformly with replacement from (corresponding to the values in the case of a random function ) from a sequence of values that are sampled uniformly without replacement from (corresponding to the values in the case of a random permutation ). The distinguishing advantage of such an algorithm is defined by the dissimilarity between the distribution of its output as induced under the two cases. Note that the total variation distance between these two distributions is , and this serves as a tight bound on the distinguishing advantage when no restrictions are placed on the distinguisher.
This implies, in particular, that encryption in the widelydeployed counter mode cannot be used when the number is approaching messages. In fact, the switching lemma is applicable, and places rather similar bounds on the number of encrypted messages, not only for symmetric encryption in the abovedescribed counter mode but also for other fundamental modes of encryption. We refer the reader to the work of Jaeger and Tessaro [3] for an indepth discussion of the cryptographic applications on the switching lemma.
The streaming switching lemma. As discussed above, the bound provided by the switching lemma is tight when no restrictions are placed on the distinguisher. Specifically, the following simple algorithm achieves the bound: When given a sequence of values as input, the algorithm outputs if there is some value that appears more than once (i.e., if a “collision” exists), and outputs otherwise. Note that when given a sequence of values that are sampled uniformly with replacement this algorithm outputs
with probability
, and when given a sequence of values that are sampled uniformly without replacement this algorithm always outputs . However, a significant drawback of this algorithm is that it needs an internal memory of size bits for storing the entire sequence in order to identify whether or not a collision exists.This observation motivated Jaeger and Tessaro [3] to refine the framework of the switching lemma by restricting the amount of internal memory used by the distinguisher. That is, they analyzed the advantage of distinguishing the above two distributions where: (1) the values are provided one by one in a streaming manner, and (2) the internal memory of the distinguisher is restricted to at most bits. The most interesting regime is where there is a noticeable gap between and , which is motivated by the fact that large amounts of data cannot always be stored in their entirety.
Known bounds. Jaeger and Tessaro proved a conditional upper bound on the distinguishing advantage of any streaming algorithm that uses at most bits of internal memory. Specifically, they introduced a combinatorial conjecture regarding certain hypergraphs, and showed that based on their conjecture the advantage of any such distinguisher is at most , when measured as the KL divergence between the output distributions of the memorybounded streaming algorithm under the two cases. Applying Pinker’s inequality, this implies an upper bound of when measured via the total variation distance, which is more standard for cryptographic applications.
In a followup work, Dinur [4] proved an unconditional upper bound of on the distinguishing advantage of any such algorithm, when measured as the total variation distance between the output distributions of the memorybounded streaming algorithm under the two cases. Note that this should be compared to the upper bound on the total variation distance obtained by applying Pinsker’s inequality to the result of Jaeger and Tessaro.
Dinur’s result is based on reducing the task of distinguishing between these two distributions via a memorybounded algorithm to constructing communicationefficient protocols for the twoparty setdisjointness problem. Three decades of extensive research on the communication complexity of this canonical problem (e.g., [5, 6, 7]) have recently led to new lower bounds [8] on which Dinur relied via his reduction.
Our contributions. We present an informationtheoretic and unconditional proof showing that the distinguishing advantage of any streaming algorithm that uses at most bits of internal memory is at most , measured via KL divergence as in the work of Jaeger and Tessaro [3]. When for any constant , we obtain an improved upper bound of which is asymptotically tight with respect to the KL divergence.
Moreover, we prove our results within a more refined framework that considers the accumulated memory usage of streaming algorithms throughout their computation, and not only their worstcase memory usage. This shows that any nonnegligible advantage must be obtained by using a substantial amount of internal memory on average throughout the computation, and not only in the worst case.
Ii Setup and Main Results
Notation.
All logarithms in this paper are to the natural base unless denoted otherwise in a subscript. For two probability distributions
and on a common discrete alphabet , where is absolutely continuous with respect to , the KLdivergence is defined as . For probability distributions and on a common discrete alphabet , where is absolutely continuous with respect to , we further define the conditional divergence as . The mutual information between and with respect to the probability distribution is , where and .Setup. For stating our results we briefly describe the notion of memorybounded streaming indistinguishability, introduced by Jaeger and Tessaro [3], as well as our refinement that considers accumulated memory usage. For an algorithm and a sequence , , the streaming computation of on is defined via the following process:

Set , where is the empty string.

For :

Let .


Output .
We abuse notation and denote the output of this computation by . Following Jaeger and Tessaro, we say that an algorithm is memorybounded if for every input and for every it holds that , where is the bit length of the internal state . For our purpose of considering accumulated memory usage, we naturally extend this notion to that of an memorybounded algorithm, where for every input and for every it holds that . From this point, and without loss of generality, we assume that for any memorybounded algorithm it holds that for all and that it holds that .^{1}^{1}1For any sequence , we may recursively define by and . Then, any memorybounded algorithm with internal states can be transformed into an memorybounded algorithm with internal states by defining if and defining otherwise, where is the input sequence (i.e., can always be stored explicitly together with the previous state instead of updating the state to ). Note that perfectly simulates the execution of for any input, and thus achieve the same distinguishing advantage.
From this point on we let and denote the probability distributions on corresponding to sampling the sequence uniformly with and without replacement, respectively, from . Namely, under we have that for , whereas under we have that are the first entries of a uniform random permutation on . The distribution of the algorithm’s output under (respectively ) is denoted by (respectively ).
Main results. The following theorem states our main result, upper bounding the distinguishing advantage of any memorybounded streaming algorithm, when measured via KL divergence:
Theorem 1
For any , and such that for all , and for any memorybounded algorithm it holds that
In particular, when for any constant , then also , and we obtain the following corollary:
Corollary 2
For any , constant , and such that for all , and for any memorybounded algorithm it holds that
Finally, for this range of parameters we observe that our bound is nearly tight:
Theorem 3
For any , and such that for all , there exists an memorybounded algorithm for which
Iii Proof of Theorem 1
Our proof is based on an induction argument showing that , where the mutual information is computed with respect to , and is the state of the internal memory at step of the computation. Then, we leverage the fact that
form a Markov chain in this order, and that
due the the memory constraints, in order to derive an information bottleneck [9] upper bound on .Iiia An Induction Argument
We prove the following lemma which is similar to a lemma proved by Jaeger and Tessaro [3].
Lemma 4
Let and be two distributions on , where the induced marginals satisfy for all , and in addition (i.e., under the distribution
the random variables
are independent, where each is distributed according to the distribution ). For a streaming computation performed by the algorithm , let be the random variable corresponding to the state produced in the th step of the computation. Thenwhere the mutual information is computed with respect to the joint distribution
, induced by .Proof. By definition of we have that . Moreover, since is obtained by processing , the data processing inequality yields
Applying the chain rule, yields
(1)  
(2) 
where (1) follows from the fact that is memoryless such that under this distribution is statistically independent of , and (2) follows from the assumption that . Thus, by induction we obtain that . Recalling that and that , our claim follows.
IiiB An InformationBottleneck Argument
We make use of the following functions:

For the binary entropy function (with respect to the natural basis) is
and we let be its inverse restricted to .

For we let .

For we let
and for we let .
We claim that is nondecreasing over , that is nondecreasing and convex, and that for every it holds that . We defer the proofs to Section V. We state and prove our main technical lemma.
Lemma 5
Let be integers, let be the random process of sampling elements of uniformly without replacement. Denote , , and let be a random variable such that form a Markov chain in this order. Then, it holds that
Proof. We first note that
(3) 
and that
(4) 
Consequently, we derive a lower bound on in terms of . To that end, we first compute the distribution for and . We have
It follows that
(5) 
Defining the random variables , we further write
(6) 
where in the last step we used the convexity of . Next, using the fact that conditioning reduces entropy, we note that
Note that dictate the elements that belong to . Let be the order at which these elements appear. Together, and completely determine , and vice versa. We have that
Plugging this into (6) (using the monotonicity of ), and then into (5), we obtain
Recalling that and , and using the convexity of , we obtain
(7) 
and the statement follows by plugging (7) into (3) using (4).
Next, we simplify the bound of Lemma 5.
Corollary 6
In the setting of Lemma 5, if and then
Proof.
To that end, we will use the following wellknown estimate (proved using Stirling’s approximation, e.g.
[10]):In particular, for large enough it holds that
Let . Using the monotonicity of , the bound of Lemma 5 reads as
(8) 
Let . Due to the the convexity of it holds that
where . Recall that and that is increasing at , and hence
and we can further upper bound (8) by
(9) 
Denoting and recalling the definition of , we further develop (9)
(10)  
where in (10) we used the inequality that holds for every . Since we can estimate
Since we assumed that , it also holds that , and we conclude that
IiiC Application to Our Setup
Finally, we can derive Theorem 1 from Corollary 6. Recall that and designate the probability distributions corresponding to sampling uniformly without and with replacement, respectively, from . Thus, for all , and furthermore, is a memoryless distribution. Thus, the conditions of Lemma 4 hold, and , where the mutual information is with respect to . Now, recalling that forms a Markov chain in this order, and that under we have that is a random process of sampling elements of uniformly without replacement, and that by the constraints on the internal memory, we can apply Corollary 6 to obtain
and Theorem 1 follows by summing over all . This settles the proof of Theorem 1.
Iv Proof of Theorem 3
Informally, given such that for all we construct an memorybounded algorithm that stores a list of values that it saw, where every new value is added to the list if the state size allows it. More formally, with a loss of at most bits per each , we may assume that is of the form for an integer for all . We remind that we assume that for all and that , thus it holds that for all and it holds that . We also assume without loss of generality that (i.e., the final output of is a single bit). For we define the computation as follows:

If , output or according to whether or , respectively.

Else, if the first bit of is , output .

Else, parse .

If , output .

Else, if , output .

Else, output .
Note that for this algorithm and that , so it holds that
and this settles the proof of Theorem 3.
V Proofs for the properties of and
In this section we give proofs for the properties of and that we used.
We start by showing that is increasing over . Indeed , so as long as . Now, we show is increasing over . Recall that , and the claim follows from the fact that is increasing and is increasing over . Next, we show that is convex by showing that its derivative is increasing. It holds that
Thus, where
Computing the derivative, for we get
So is increasing, thus is increasing and is convex as claimed. Finally, we show that for every it holds that . When it simply holds that . When , it holds that , thus we need to show that . Let . Then, (when ), so is concave over . Together with the fact that we get that when .
References
 [1] O. Goldreich, Foundations of Cryptography – Volume 1: Basic Techniques. Cambridge University Press, 2001.
 [2] J. Katz and Y. Lindell, Introduction to Modern Cryptography (2nd Edition). CRC Press, 2014.
 [3] J. Jaeger and S. Tessaro, “Tight timememory tradeoffs for symmetric encryption,” in Advances in Cryptology – EUROCRYPT, 2019, pp. 467–497.
 [4] I. Dinur, “On the streaming indistinguishability of a random permutation and a random function,” To Appear in Advances in Cryptology – EUROCRYPT, 2020.
 [5] Z. BarYossef, T. S. Jayram, R. Kumar, and D. Sivakumar, “An information statistics approach to data stream and communication complexity,” J. Comput. Syst. Sci., vol. 68, no. 4, pp. 702–732, 2004.
 [6] B. Kalyanasundaram and G. Schnitger, “The probabilistic communication complexity of set intersection,” SIAM J. Discrete Math., vol. 5, no. 4, pp. 545–557, 1992.
 [7] A. A. Razborov, “On the distributional complexity of disjointness,” Theor. Comput. Sci., vol. 106, no. 2, pp. 385–390, 1992.
 [8] M. Göös and T. Watson, “Communication complexity of setdisjointness for all probabilities,” Theory of Computing, vol. 12, no. 1, pp. 1–23, 2016.
 [9] N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” in 37th Annual Allerton Conference on Communications, Control, and Computing, 1999, pp. 368–377.
 [10] F. MacWilliams and N. Sloane, The Theory of ErrorCorrecting Codes. Elsevier Science, 1977.
Comments
There are no comments yet.