On Finding Quantum Multi-collisions

11/13/2018 ∙ by Qipeng Liu, et al. ∙ 0

A k-collision for a compressing hash function H is a set of k distinct inputs that all map to the same output. In this work, we show that for any constant k, Θ(N^1/2(1-1/2^k-1)) quantum queries are both necessary and sufficient to achieve a k-collision with constant probability. This improves on both the best prior upper bound (Hosoyamada et al., ASIACRYPT 2017) and provides the first non-trivial lower bound, completely resolving the problem.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Collision resistance is one of the central concepts in cryptography. A collision for a hash function is a pair of distinct inputs that map to the same output: .

Multi-collisions.

Though receiving comparatively less attention in the literature, multi-collision resistance is nonetheless an important problem. A -collision for is a set of distinct inputs such that for where for all .

Multi-collisions frequently surface in the analysis of hash functions and other primitives. Examples include MicroMint [RS97], RMAC [JJV02], chopMD [CN08], Leamnta-LW [HIK11], PHOTON and Parazoa [NO14], the Keyed-Sponge [JLM14], all of which assume the multi-collision resistance of a certain function. Multi-collisions algorithms have also been used in attacks, such as the MDC-2 [KMRT09], HMAC [NSWY13], Even-Mansour [DDKS14], and LED [NWW14]. Multi-collision resistance for polynomial has also recently emerged as a theoretical way to avoid keyed hash functions [BKP18, BDRV18], or as a useful cryptographic primitives, for example, to build statistically hiding commitment schemes with succinct interaction[KNY18].

Quantum.

Quantum computing stands to fundamentally change the field of cryptography. Importantly for our work, Grover’s algorithm [Gro96] can speed up brute force searching by a quadratic factor, greatly increasing the speed of pre-image attacks on hash functions. In turn, Grover’s algorithm can be used to find ordinary collisions () in time , speeding up the classical “birthday” attack which requires time. It is also known that, in some sense (discussed below), these speedups are optimal [AS04, Zha15a]. These attacks require updated symmetric primitives with longer keys in order to make such attacks intractable.

1.1 This Work: Quantum Query Complexity of Multi-collision Resistance

In this work, we consider quantum multi-collision resistance. Unfortunately, little is known of the difficulty of finding multi-collisions for in the quantum setting. The only prior work on this topic is that of Hosoyamada et al. [HSX17], who give a algorithm for 3-collisions, as well as algorithms for general constant . On the lower bounds side, the from the case applies as well for higher , and this is all that is known.

We completely resolve this question, giving tight upper and lower bounds for any constant . In particular, we consider the quantum query complexity of multi-collisions. We will model the hash function as a random oracle. This means, rather than getting concrete code for a hash function , the adversary is given black box access to a function chosen uniformly at random from the set of all functions from into . Since we are in the quantum setting, black box access means the adversary can make quantum queries to . Each query will cost the adversary 1 time step. The adversary’s goal is to solve some problem — in our case find a -collision — with the minimal cost. Our results are summarized in Table 1. Both our upper bounds and lower bounds improve upon the prior work for ; for example, for , we show that the quantum query complexity is .

Upper Bound (Algorithm) Lower Bound
[BHT98] for (2-to-1)
[AS04] for (2-to-1)
[Zha15a] for (Random, for (Random)
[HSX17] ()
This Work () (Random)
Table 1: Quantum query complexity results for -collisions. is taken to be a constant, and all Big and notations hide constants that depend on . In parenthesis are the main restrictions for the lower bounds provided. We note that in the case of 2-to-1 functions, , so implicitly these bounds only apply in this regime. In these cases, characterizes the query complexity. On the other hand, for random or arbitrary functions, is the more appropriate way to measure query complexity. We also note that for arbitrary functions, when , it is possible that contains no -collisions, so the problem becomes impossible. Hence, is essentially tight. For random functions, there will be no collisions w.h.p unless , so algorithms on random functions must always operate in this regime.

1.2 Motivation

Typically, the parameters of a hash function are set to make finding collisions intractable. One particularly important parameter is the output length of the hash function, since the output length in turn affects storage requirements and the efficiency of other parts of a cryptographic protocol.

Certain attacks, called generic attacks, apply regardless of the implementation details of the hash function , and simply work by evaluating on several inputs. For example, the birthday attack shows that it is possible to find a collision in time approximately by a classical computer. Generalizations show that -collisions can be found in time 111Here, the Big Theta notation hides a constant that depends on .

These are also known to be optimal among classical generic attacks. This is demonstrated by modeling as an oracle, and counting the number of queries needed to find (-)collisions in an arbitrary hash function . In cryptographic settings, it is common to model as a random function, giving stronger average case lower bounds.

Understanding the effect of generic attacks is critical. First, they cannot be avoided, since they apply no matter how is designed. Second, other parameters of the function, such as the number of iterations of an internal round function, can often be tuned so that the best known attacks are in fact generic. Therefore, for many hash functions, the complexity of generic attacks accurately represents the actual cost of breaking them.

Therefore, for “good” hash functions where generic attacks are optimal, in order to achieve security against classical adversaries must be chosen so that time steps are intractable. This often means setting , so . In contrast, generic classical attacks can find -collisions in time . For example, this means that must be set to to avoid -collisions, or to avoid -collisions.

Once quantum computers enter the picture, we need to consider quantum queries to in order to model actual attacks that evaluate in superposition. This changes the query complexity, and makes proving bounds much more difficult. Just as understanding query complexity in the classical setting was crucial to guide parameter choices, it will be critical in the quantum world as well.

We also believe that quantum query complexity is an important study in its own right, as it helps illuminate the effects quantum computing will have on various areas of computer science. It is especially important to cryptography, as many of the questions have direct implications to the post-quantum security of cryptosystems. Even more, the techniques involved are often closely related to proof techniques in post-quantum cryptography. For example, bounds for the quantum query complexity of finding collisions in random functions [Zha15a], as well as more general functions [EU17, BES17], were developed from techniques for proving security in the quantum random oracle model [BDF11, Zha12, TU16]. Similarly, the lower bounds in this work build on techniques for proving quantum indifferentiability [Zha18]. On the other hand, proving the security of MACs against superposition queries [BZ13] resulted in new lower bounds for the quantum oracle interrogation problem [van98] and generalizations [Zha15b].

Lastly, multi-collision finding can be seen as a variant of -distinctness, which is essentially the problem of finding a -collision in a function , where the -collision may be unique and all other points are distinct. The quantum query complexity of -distinctness is currently one of the main open problems in quantum query complexity. An upper bound of was shown by Belovs [Bel12]. The best known lower bound is  [BKT18]. Interestingly, the dependence of the exponent on is exponential for the upper bound, but polynomial for the lower bound, suggesting a fundamental gap our understanding of the problem.

Note that our results do not immediately apply in this setting, as our algorithm operates only in a regime where there are many (-)collisions, whereas -distinctness applies even if the -collision is unique and all other points are distinct (in particular, no -collisions). On the other hand, our lower bound is always lower than , which is trivial for this problem. Nonetheless, both problems are searching for the same thing — namely a -collisions — just in different settings. We hope that future work may be able to extend our techniques to solve the problem of -distinctness.

1.3 The “Reciprocal Plus 1” Rule

For many search problems over random functions, such as pre-image search, collision finding, -sum, quantum oracle interrogation, and more, a very simple folklore rule of thumb translates the classical query complexity into quantum query complexity.

In particular, all of these problems have a classical query complexity for some rational number . Curiously, the quantum query complexity of all these problems is always .

In slightly more detail, for all of these problems the best classical -query algorithm solves the problem with probability for some constants , where . Then the classical query complexity is . For this class of problems, the success probability of the best query quantum algorithm is obtained simply by increasing the power of by . This results in a quantum query complexity of . Examples:

  • Grover’s pre-image search [Gro96] improves success probability from to , which is known to be optimal [BBBV97]. The result is a query complexity improvement from to .

    Similarly, finding, say, 2 pre-images has classical success probability ; it is straightforward to adapt known techniques to prove that the best quantum success probability is . Again, the query complexity goes from to . Analogous statements hold for any constant number of pre-images.

  • The BHT collision finding algorithm [BHT98] finds a collision with probability , improving on the classical birthday attack . Both of these are known to be optimal [AS04, Zha15a]. Thus quantum algorithms improve the query complexity from to .

    Similarly, finding, say, 2 distinct collisions has classical success probability , whereas we show that the quantum success probability is . More generally, any constant number of distinct collisions conforms to the Reciprocal Plus 1 Rule.

  • -sum asks to find a set of inputs such that the sum of the outputs is 0. This is a different generalization of collision finding than what we study in this work. Classically, the best algorithm succeeds with probability . Quantumly, the best algorithm succeeds with probability  [BS13, Zha18]. Hence the query complexity goes from to .

    Again, solving for any constant number of distinct -sum solutions also conforms to the Reciprocal Plus 1 Rule.

  • In the oracle interrogation problem, the goal is to compute input/output pairs, using only queries. Classically, the best success probability is clearly . Meanwhile, Boneh and Zhandry [BZ13] give a quantum algorithm with success probability roughly , which is optimal.

Some readers may have noticed that Reciprocal Plus 1 (RP1) rule does not immediately appear to apply the Element Distinctness. The Element Distinctness problem asks to find a collision in where the collision is unique. Classically, the best algorithm succeeds with probability . On the other hand, quantum algorithms can succeed with probability , which is optimal [Amb04, Zha15a]. This does not seem to follow the prediction of the RP1 rule, which would have predicted . However, we note that unlike the settings above which make sense when , and where the complexity is characterized by , the Element Distinctness problem requires and the complexity is really characterized by the domain size . Interestingly, we note that for a random expanding function, when , there will with constant probability be exactly one collision in . Thus, in this regime the collision problem matches the Element Distinctness problem, and the RP1 rule gives the right query complexity!

Similarly, the quantum complexity for -sum is usually written as , not . But again, this is because most of the literature considers for which there is a unique -sum and is non-compressing, in which case the complexity is better measured in terms of . Notice that a random function will contain a unique collision when , in which case the bound we state (which follows the RP1 rule) exactly matches the statement usually given.

On the other hand, the RP1 rule does not give the right answer for -distinctness for , since the RP1 rule would predict the exponent to approach for large , whereas prior work shows that it approaches for large . That RP1 does not apply perhaps makes sense, since there is no setting of where a random function will become an instance of -distinctness: for any setting of parameters where a random function has a -collision, it will also most likely have many -collisions.

The takeaway is that the RP1 Rule seems to apply for natural search problems that make sense on random functions when . Even for problems that do not immediately fit this setting such as Element Distinctness, the rule often still gives the right query complexity by choosing so that a random function is likely to give an instance of the desired problem.

Enter -collisions.

In the case of -collisions, the classical best success probability is , giving a query complexity of . Since the -collision problem is a generalization of collision finding, is similar in spirit to the problems above, and applies to compressing random functions, one may expect that the Reciprocal Plus 1 Rule applies. If true, this would give a quantum success probability of , and a query complexity of .

Even more, for small enough , it is straightforward to find a -collision with probability as desired. In particular, divide the queries into blocks. Using the first queries, find a 2-collision with probability . Let be the image of the collision. Then, for each of the remaining blocks of queries, find a pre-image of with probability using Grover search. The result is colliding inputs with probability . It is also possible to prove that this is a lower bound on the success probability (see lower bound discussion below). Now, this algorithm works as long , since beyond this range the 2-collision success probability is bounded by . Nonetheless, it is asymptotically tight in the regime for which it applies. This seems to suggest that the limitation to small might be an artifact of the algorithm, and that a more clever algorithm could operate beyond the barrier. In particular, this strongly suggests -collisions conforms to the Reciprocal Plus 1 Rule.

Note that the RP1 prediction gives an exponent that depends polynomially on , asymptotically approaching . In contrast, the prior work of [HSX17] approaches exponentially fast in . Thus, prior to our work we see an exponential vs polynomial gap for -collisions, similar to the case of -distinctness.

Perhaps surprisingly given the above discussion222At least, the authors found it surprising!, our work demonstrates that the right answer is in fact exponential, refuting the RP1 rule for -collisions.

As mentioned above, our results do not immediately give any indication for the query complexity of -distinctness. However, our results may hint that -distinctness also exhibits an exponential dependence on . We hope that future work, perhaps building on our techniques, will be able to resolve this question.

1.4 Technical Details

1.4.1 The Algorithm

At their heart, the algorithms for pre-image search, collision finding, -sum, and the recent algorithm for -collision, all rely on Grover’s algorithm. Let be a function with a fraction of accepting inputs. Grover’s algorithm finds the input with probability using quantum queries to . Grover’s algorithm finds a pre-image of a point in by setting to be 1 if and only if .

The BHT algorithm [BHT98] uses Grover’s to find a collision in . First, it queries on random points, assembling a database . As long as , all the images in will be distinct. Now, it lets be the function that equals 1 if and only if is found amongst the images in , and is not among the pre-images. By finding an accepting input to , one immediately finds a collision. Notice that the fraction of accepting inputs is approximately .

By running Grover’s for steps, one obtains a such a pre-image, and hence a collision, with probability .

Hosoyamada et al. show how this idea can be recursively applied to find multi-collisions. For , the first step is to find a database consisting of distinct 2-collisions. By recursively applying the BHT algorithm, each 2-collision takes time . Then, to find a 3 collision, set up as before: if and only if is amongst the images in and is not among the pre-images. The fraction of accepting inputs is approximately , so Grover’s algorithm will find a 3-collision in time . Setting to be optimizes the total query count as . For , recursively build a table of 3-collisions, and set up to find a collision with the database.

The result is an algorithm for -collisions for any constant , using queries.

Our algorithm improves on Hosoyamada et al.’s, yielding a query complexity of . Note that for Hosoyamada et al.’s algorithm, when constructing , many different databases are being constructed, one for each entry in . Our key observation is that a single database can be re-used for the different entries of . This allows us to save on some of the queries being made. These extra queries can then be used in other parts of the algorithm to speed up the computation. By balancing the effort correctly, we obtain our algorithm. Put another way, the cost of finding many (-)collisions can be amortized over many instances, and then recursively used for finding collisions with higher . Since the recursive steps involve solving many instances, this leads to an improved computational cost.

In more detail, we iteratively construct databases . Each will have -collisions. We set , indicating that we only need a single -collision. To construct database , simply query on arbitrary points. To construct database , define the function that accepts inputs that collide with but are not contained in . The fraction of points accepted by is approximately . Therefore, Grover’s algorithm returns an accepting input in time . We simply run Grover’s algorithm times using the same database to construct in time .

Now we just optimize by setting the number of queries to construct each database to be identical. Notice that , so solving for gives us

Setting and solving for gives the desired result. In particular, in the case , our algorithm finds a collision in time .

1.4.2 The Lower Bound.

Notice that our algorithm fails to match the result one would get by applying the “Reciprocal Plus 1 Rule”. Given the discussion above, one may expect that our iterative algorithm could potentially be improved on even more. To the contrary we prove that, in fact, our algorithm is asymptotically optimal for any constant .

Toward that end, we employ a recent technique developed by Zhandry [Zha18] for analyzing quantum queries to random functions. We use this technique to show that our algorithm is tight for random functions, giving an average-case lower bound.

Zhandry’s “Compressed Oracles.”

Zhandry demonstrates that the information an adversary knows about a random oracle can be summarized by a database of input/output pairs, which is updated according to special rules. In Zhandry’s terminology, is the “compressed standard/phase oracle”.

This is not a classical database, but technically a superposition of databases, meaning certain amplitudes are assigned to each possible database. can be measured, obtaining an actual classical database with probability equal to its amplitude squared. In the following discussion, we will sometimes pretend that is actually a classical database. While inaccurate, this will give the intuition for the lower bound techniques we employ. In the section 4 we take care to correctly analyze as a superposition of databases.

Zhandry shows roughly the following:

  • Consider any “pre-image problem”, whose goal is to find a set of pre-images such that the images satisfy some property. For example, -collision is the problem of finding pre-images such that the corresponding images are all the same.

    Then after queries, consider measuring . The adversary can only solve the pre-image problem after queries if the measured has a solution to the pre-image problem.

    Thus, we can always upper bound the adversary’s success probability by upper bounding the probability contains a solution.

  • starts off empty, and each query can only add one point to the database.

  • For any image point , consider the amplitude on databases containing as a function of (remember that amplitude is the square root of the probability). Zhandry shows that this amplitude can only increase by from one query to the next. More generally, for a set of different images, the amplitude on databases containing any point in can only increase by .

The two results above immediately imply the optimality of Grover’s search. In particular, the amplitude on databases containing is at most after queries, so the probability of obtaining a solution is the square of this amplitude, or . This also readily gives a lower bound for the collision problem. Namely, in order to introduce a collision to , the adversary must add a point that collides with one of the existing points in . Since there are at most such points, the amplitude on such can only increase by . This means the overall amplitude after queries is at most . Squaring to get a probability gives the correct lower bound.

A First Attempt.

Our core idea is to attempt a lower bound for -collision by applying these ideas recursively. The idea is that, in order to add, say, a 3-collision to , there must be an existing 2-collision in the database. We can then use the 2-collision lower bound to bound the increase in amplitude that results from each query.

More precisely, for very small , we can bound the amplitude on databases containing distinct 2-collisions as . If , must be a constant else this term is negligible. So we can assume for that is a constant.

Then, we note that in order to introduce a 3-collision, the adversary’s new point must collide with one of the existing 2-collisions. Since there are at most , we know that the amplitude increases by at most since is a constant. This shows that the amplitude on databases with 3-collisions is at most .

We can bound the amplitude increase even smaller by using not only the fact that the database contains at most 2-collisions, but the fact that the amplitude on databases containing even a single 2-collision is much less than 1. In particular, it is as demonstrated above. Intuitively, it turns out we can actually just multiply the amplitude increase in the case where the database contains a 2-collision by the amplitude on databases containing any 2-collision to get an overall amplitude increase of .

Overall then, we upper bound the amplitude after queries by , given an upper bound of on the probability of finding a 3-collision. This lower bound can be extended recursively to any constant -collisions, resulting in a bound that exactly matches the Reciprocal Plus 1 Rule, as well as the algorithm for small ! This again seems to suggest that our algorithm is not optimal.

Our Full Proof.

There are two problems with the argument above that, when resolved, actually do show our algorithm is optimal. First, when , the part of the amplitude bound becomes vacuous, as amplitudes can never be more than 1. Second, the argument fails to consider algorithms that find many 2-collisions, which is possible when . Finding many 2-collisions of course takes more queries, but then it makes extending to 3-collisions easier, as there are more collisions in the database to match in each iteration.

In our full proof, we examine the amplitude on the databases containing a 3-collision as well as 2-collisions, after queries. We call this amplitude . We show a careful recursive formula for bounding using Zhandry’s techniques, which we then solve.

More generally, for any constant , we let be the amplitude on databases containing exactly distinct -collisions and at least distinct -collisions after queries. We develop a multiply-recursive formula for the in terms of the and . We then recursively plug in our solution to so that the recursion is just in terms of , which we then solve using delicate arguments.

Interestingly, this recursive structure for our lower bound actually closely matches our algorithm. Namely, our proof lower bounds the difficulty of adding an -collision to a database containing many collisions, exactly the problem our algorithm needs to solve. Our techniques essentially show that every step of our algorithm is tight, resulting in a lower bound of , exactly matching our algorithm. Thus, we solve the quantum query complexity of -collisions.

Acknowledgements

This work is supported in part by NSF. Opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NSF.

2 Preliminaries

Here, we recall some basic facts about quantum computation, and review the relevant literature on quantum search problems.

2.1 Quantum Computation

A quantum system is defined over a finite set of classical states. In this work we will mostly consider . A pure state over

is a unit vector in

, which assigns a complex number to each element in . In other words, let be a pure state in , we can write as:

where and is called the “computational basis” of . The computational basis forms an orthonormal basis of .

Given two quantum systems over and over , we can define a product quantum system over the set . Given and , we can define the product state .

We say is entangled if there does not exist and such that . For example, consider and , is entangled. Otherwise, we say is un-entangled.

A pure state can be manipulated by a unitary transformation . The resulting state .

We can extract information from a state is by performing a measurement. A measurement specifies an orthonormal basis, typically the computational basis, and the probability of getting result is . After the measurement, “collapses” to the state if the result is .

For example, given the pure state measured under , with probability the result is and collapses to ; with probability the result is and collapses to .

We finally assume a quantum computer can implement any unitary transformation (by using so-called Hadamard, phase, CNOT and gates), especially the following two gates:

  • Classical Computation: Given a function , one can implement a unitary over such that for any ,

    Here, is a commutative group operation defined over .

  • Quantum Fourier Transform: Let . Given a quantum state , by applying only basic gates, one can compute where the sequence

    is the sequence achieved by applying the classical Fourier transform

    to the sequence :

    where , is the imaginary unit.

    One interesting property of QFT is that by preparing and applying

    to each qubit,

    which is a uniform superposition over all possible .

For convenience, we sometimes ignore the normalization of a pure state which can be calculated from the context.

2.2 Grover’s algorithm and BHT algorithm

Definition 1 (Database Search Problem).

Suppose there is a function/database encoded as and is non-empty. The problem is to find such that .

We will consider adversaries with quantum access to , meaning they submit queries as and receive in return . Grover’s algorithm [Gro96] finds a pre-image using an optimal number of queries:

Theorem 2 ([Gro96, Bbht98]).

Let be a function . Let be the number of pre-images of . There is a quantum algorithm that finds such that with an expected number of quantum queries to F at most even without knowing in advance.

We will normally think of the number of queries as being fixed, and consider the probability of success given the number of queries. The algorithm from Theorem 2, when run for queries, can be shown to have a success probability . For the rest of the paper, “Grover’s algorithm” will refer to this algorithm.

Now let us look at another important problem: -collision finding problem on -to- functions.

Definition 3 (Collision Finding on 2-to-1 Functions).

Assume . Consider a function such that for every , . In other words, every image has exactly two pre-images. The problem is to find such that .

Brassard, Høyer and Tapp proposed a quantum algorithm [BHT98] that solved the problem using only quantum queries. The idea is the following:

  • Prepare a list of input and output pairs, where is drawn uniformly at random and ;

  • If there is a -collision in , output that pair. Otherwise,

  • Run Grover’s algorithm on the following function : if and only if there exists , and . Output the solution , as well as whatever it collides with.

This algorithm takes quantum queries and when , the algorithm finds a -collision with quantum queries.

2.3 Multi-collision Finding and [Hsx17]

Hosoyamada, Sasaki and Xagawa proposed an algorithm for -collision finding on any function where ( is a constant). They generalized the idea of [BHT98] and gave the proof for even arbitrary functions. We now briefly talk about their idea. For simplicity in this discussion, we assume is a -to- function.

The algorithm prepares pairs of -collisions by running the BHT algorithm times. If two pairs of -collisions collide, there is at least a -collision (possibly a -collision). Otherwise, it uses Grover’s algorithm to find a , and . The number of queries is . When , the query complexity is minimized to .

By induction, finding a -collision requires quantum queries. By preparing -collisions and applying Grover’s algorithm to it, it takes quantum queries to get one -collision. It turns out that and the complexity of finding -collision is .

In Section 3, we improve their algorithm to quantum queries.

2.4 Compressed Fourier Oracles and Compressed Phase Oracles

In [Zha18], Zhandry showed a new technique for analyzing cryptosystems in the random oracle model. He also showed that his technique can be used to re-prove several known quantum query lower bounds. In this work, we will extend his technique in order to prove a new optimal lower bound for multi-collisions.

The basic idea of Zhandry’s technique is the following: assume is making a query to a random oracle and the query is . Instead of only considering the adversary’s state for a random oracle , we can actually treat the whole system as

where is the truth table of . By looking at random oracles that way, Zhandry showed that these five random oracle models are equivalent:

  1. Standard Oracles:

  2. Phase Oracles:

    where . In other words, apply the QFT to the registers, apply the Standard query, and then apply the QFT one more time.

  3. Fourier Oracles: We can view as . In other words, if we perform the Fourier transform on a function that always outputs , we will get a uniform superposition over all the possible functions .

    Moreover, is equivalent to . Here means updating (xor) the -th entry in the database with .

    So in this model, we start with where is an all-zero function. By making the -th query, we have

    The Fourier oracle incorporates the and operates directly on the registers:

  4. Compressed Fourier Oracles: The idea is basically the same as Fourier oracles. But when the algorithm only makes queries, the function for any contains at most non-zero entries.

    So to describe , we only need at most different pairs () which says the database outputs on and everywhere else. And is doing the following: 1) if is not in the list and , put in ; 2) if is in the list and , update to in ; 3) if is in the list and , remove from .

    In the model, we start with where is an empty list. After making the -th query, we have

  5. Compressed Standard/Phase Oracles: These two models are essentially equivalent up to an application of applied to the query response register. From now on we only consider compressed phase oracles.

    By applying QFT on the entries of the database registers of a compressed Fourier oracle, we get a compressed phase oracle.

    In this model, contains all the pair which means the oracle outputs on and uniformly at random on other inputs. When making a query on ,

    • if is in the database for some , a phase will be added to the state; it corresponds to update to in the compressed Fourier oracle model;

    • otherwise a superposition is appended to the state ; it corresponds to put a new pair in the list in the compressed Fourier oracle model;

    • also make sure that the list will never have an pair in the compressed Fourier oracle model (in other words, it is in the compressed phase oracle model); if there is one, delete that pair;

    • All the ‘append’ and ‘delete’ operations above mean applying QFT.

3 Algorithm for Multi-collision Finding

In this section, we give an improved algorithm for -collision finding. We use the same idea from [HSX17] but carefully reorganize the algorithm to reduce the number of queries.

As a warm-up, let us consider the case and the case where is a -to- function, . They gives an algorithm with quantum queries. Here is our algorithm with only quantum queries:

  • Prepare a list where are distinct and . This requires classical queries on random points.

  • Define the following function on :

    Run Grover’s algorithm on function . Wlog (by reordering ), we find such that and using quantum queries.

  • Repeat the last step times, we will have -collisions . This takes quantum queries.

  • If two elements in collide, simply output a -collision. Otherwise, run Grover’s on function :

    A -collision will be found when Grover’s algorithm finds a pre-image of on . It takes quantum queries.

Overall, the algorithm find a -collision using quantum queries.

The similar algorithm and analysis works for any constant and any -to- function which only requires quantum queries. Let . The algorithm works as follows:

  • Assume is a -to- function and .

  • Prepare a list of input-output pairs of size . With overwhelming probability (), does not contain a collision. By letting , this step makes quantum queries.

  • Define a function that returns if the input is not in but the image collides with one of the images in , otherwise it returns . Run Grover’s on times. Every time Grover’s algorithm outputs , it gives a -collision. With probability (explained below), all these collisions do not collide. So we have a list of different -collisions. This step makes quantum queries.

  • For , define a function that returns if the input is not in but the image collides with one of the images of -collisions in , otherwise it returns . Run Grover’s algorithm on times. Every time Grover’s algorithm outputs , it gives a -collision. With probability , all these collisions do not collide. So we have a list of different -collisions. This step makes quantum queries.

  • Finally given -collisions, using Grover’s to find a single that makes a -collision with one of the -collision in . This step makes quantum queries by letting .

The number of quantum queries made by the algorithm is simply:

So we have the following theorem:

Theorem 4.

For any constant , any -to- function (), there is an algorithm that finds a -collision using quantum queries.

We now show the above conclusion holds for an arbitrary function as long as . To prove this, we use the following lemma:

Lemma 5.

Let be a function and . Let be the probability that if we choose uniformly at random and , the number of pre-images of is at least . We have .

Proof.

To make the probability as small as possible, we want that if has less than pre-images, should have exactly pre-images. So the probability is at least

Theorem 6.

Let be a function and . The above algorithm finds a -collision using quantum queries with constant probability.

Proof.

We prove the case . The case