The subset sum problem is one of the most fundamental problems in theoretical computer science, and is one of the most famous NP-hard problems. Due to its difficulty, subset sum problem is often used to design cryptosystems[8, 10, 17, 21, 22]. An instance of this problem consists of . Given an instance, there exist two forms of subset sum problem. The first form is the decision subset sum problem, where we need to decide whether there exists a subset of with sum . The second form is the computational subset sum problem, where we need to find a subset of that sums to . The decision subset sum problem is NP-complete. It is well-known that given access to an oracle that solves the decision problem, the computational problem can be solved using calls to this oracle.
In 1974, Horowitz and Sahni (HS) introduced a Meet-in-the-Middle algorithm with time and space complexity . In 1981, Schroeppel and Shamir (SS)  improved this to time complexity with only space complexity . These algorithms are still the fastest known for solving general instances of subset sum.
, by giving an oracle solving the shortest vector problemin lattices. The density of a random subset sum instance is defined as . In 1991 this bound was improved by Coster et al.  and Joux, Stern  to . Note that this transformation does not rule out the hardness of subset sum problem in the low-density regime, since solving is known to be NP-hard . In the high-density regime with dynamic programming solves subset sum problem efficiently .
However, for the case only exponential time algorithms are known. In a breakthrough paper, Howgrave-Graham and Joux (HGJ)  at Eurocrypt 2010 showed that random subset sum instances can be solved in time . At Eurocrypt 2011, Becker, Coron and Joux (BCJ)  proposed a modification to the HGJ algorithm with heuristic run time . In 2019, Esser and May (EM)  proposed a new heuristic algorithm based on representation and sampling technique with run time
. Sampling technique introduces variance that increases the amount of representations and brings more optimization flexibility. A remarkable property is that the complexity of the EM algorithm improves with increasing search tree depth.
Quantum complexity of subset sum.
In 2013, Bernstein, Jeffery, Lange and Meurer  constructed quantum subset sum algorithms, inspired by the HS algorithm, the SS algorithm and the HGJ algorithm. In detail, Bernstein et al. showed that the quantum HS algorithm achieve run time . Moreover, a first quantum version of the SS algorithm with Grover search  runs in time using only space . A second quantum version of the SS algorithm using quantum walks [3, 2] achieves time . Eventually, Bernstein et al. used the quantum walk framework of Magniez et al.  to achieve a quantum version of the HGJ algorithm with time and space complexity . In 2018, Helm and May  achieve a quantum version of the BCJ algorithm with time and space complexity , which is the best known quantum random subset sum algorithm.
The main contribution of our paper is to combine the classical sampling and quantum walks to get a new quantum algorithm with running time down to . In the previous quantum walk algorithms, e.g. quantum HGJ and quantum BCJ algorithms, the key point is that we no longer enumerate the initial lists, but only start with random subsets of the initial lists some fixed size that has to be optimized. Although these quantum algorithms do not enumerate the initial lists, we know whether a given element belongs to the initial lists. However, the EM algorithm builds the initial lists by sampling. We do not know whether a given element belongs to the initial lists in the EM algorithm. Therefore, quantum walks cannot be used directly. One simple way to solve this problem is firstly sampling to give us the initial lists. Next, carry out quantum walks. Moreover, we need to define an appropriate quantum walk for the EM algorithm within the framework of Magniez et al. .
The paper is organized as follows. In Section 2 we introduce the random subset sum problem and some notations. In Section 3 we firstly describe the EM algorithm with search tree depth , whose quantum version is optimal for the depth. Then we describe the EM algorithm with arbitrary depth. In Section 4, we convert the random subset sum problem to graph search problem, and we analyze the cost of a random walk on the search space defined by the EM algorithm. Then we define an appropriate data structure and give our quantum algorithm. Finally, we conclude in Sections 5.
The following are standard notations in computational complexity. Let be positive valued functions in . Then
if there exist constants and such that for all .
if there exist constants and such that for all .
if both and .
differ from respectively only by a logarithmic factor.
Throughout the paper, we denote by the Hamming weight of a vector . We use the same notation for denoting the cardinality of a set. And we denote by the logarithmic function with base .
[Random Subset Sum] Let be chosen uniformly at random. For a random with , let . Then is called a random subset sum instance, while each with is called a solution.
By we refer to the binary entropy function, which is defined on input as , where we use the convention . We approximate binomial coefficients by the entropy function, derived from Stirling s formula .
be a discrete random variable following the distribution, which is defined on a finite alphabet . For let . We define the entropy of a random variable or equivalently its distribution as
For we refer by
to the Bernoulli distribution with parameter, that is for we have and . The sum of iid
-distributed random variables is binomially distributed with parametersand , which we denote by .
The following lemma guarantees that we do not obtain too many duplicate vectors when sampling a limited amount of with iid .
3 The EM Classical Algorithm
The Esser-May (EM) algorithm is search tree-based and solves the instances in a divide-and-conquer method using the representation method. Under an appropriate assumption, the EM algorithm can implement for a search tree with arbitrary depth. We denote by the EM algorithm with search tree depth . Whereas run time of the EM algorithm decreases with increasing search tree depth , our quantum algorithm is optimal at the search tree depth . Thus, in this section, we firstly describe . Then we generalize to .
Let be a subset sum instance with a solution with . That is, .
Before describing , we recall a classical list join operator that we extensively use. The join operator performs the following task: given two lists of numbers and of respective sizes and , together with two integers and , the algorithm computes the list such that: . The list can be constructed as follows. Sort and then for every we find via binary search all elements such that . The complexity of this method is . Moreover, assuming that the values of the initial lists modulo are randomly distributed, the expected of is .
The basic idea of representation is to represent the solution as a sum where and . If we can effectively (respectively ) construct two lists and , then using the list join operator above, we can effectively get the solution via the joined list . Note that sorting and searching are performed with respect to and , where . Generally speaking, this effectiveness of constructing base lists is not guaranteed. Thus we decompose the original random subset sum problem into two subproblems several times to improve the complexity of the EM algorithm. That is, represent the solution as a sum for some .
In , we represent the solution as a sum with , where , . We call the representation of the form as the level- representation for each . The tree structure of is shown in Figure 3.1. Denote by the -st list of the level , where , . Denote by the elements of , where , . In other words, are candidates for , where , . Define the join operator for as , where , .
Consider the level- representation . By linearity of the inner product any level- representation satisfies Obviously, this equation also holds modulo for any . In Figure 3.1 we construct on level of our search tree in list candidates , where . By the randomness of , the inner products for some randomly distributed modulo and thus also modulo . Thus, if we fix a certain constraint , then we expect that a -fraction of all level- representations satisfies . This enables us to filter out elements of the search space, as well as representations via constraints. Similarly, we construct on level of our search tree in list candidates , where . Eventually, construct on level in list candidates .
Figure 3.1 Tree structure of . The portion covered by the slash represents the result of the join operator.
Our goal is to construct on expectation a single representation of on Level . To this end, we initially construct the the level lists , , where . For each j, we sample iid vectors , and , where . Then, we construct the level lists , where . In order that the join operator can be successfully executed, choose random , and let . By the definition of the join operator, on level we get only those candidates satisfying for some . Note that all level- candidates are vectors from .
Similarly, we construct the level lists , where . Note that be chosen randomly on satisfying and . Then, we construct the level lists , where . Note that be chosen randomly on satisfying , and . Finally, we construct by setting . If satisfying , then is a solution of the origial random subset sum instance.
Note that any non-binary cannot be part of a valid representation of , and may safely be filtered out. Therefore, after constructing each , we immediately eliminate all non-binary vectors. In fact, the join operator already includes operations for filtering out non-binary vectors.
In our construction we tune the parameters such that we expect to obtain a representation of our solution in . Hence, a linear pass through all elements of yields a subset sum solution (on expectation). A pseudocode description of the algorithm is given by Algorithm 3.1.
Algorithm 1 The Algorithm
The Number of Representations
Let be a subset sum solution. Denote by the expected number of level- representations of . Denote by the expected number of level- representations of . And denote by the expected number of level- representations of . Formally, define
For a fixed
For a fixed
Obviously, and . We define . Let us compute the values and their corresponding
. Note that, we ignore for a moment the fact that we put constraintsin Algorithm 3.1 to eliminate representations from level on. Hence the numbers count the total number of representations, without any eliminations.
The elements in the base lists are sampled from respectively . As a consequence, the elements of the level- lists are from . Let , where . Then for each coordinate of we have and . Hence a candidate is a representation of the -weight solution with probability
By construction, Algorithm 3.1 computes every level- list out of elements. Thus, we expect , and .
Let be an arbitrary combination of level- elements. As before, is a representation of with probability . Observe that . By construction, . But . This implies that list , without any constraints, contains at least (not necessary different) vectors. By heuristically treating level- list elements as independently sampled from , an application of Lemma 2 yields that contains different vectors. Without loss of generality, we may assume that the size of is upper bounded by . Therefore, we expect , and . Similarly, we expect , and .
So, and .
Let be a solution to our subset sum problem. As already mentioned, our goal is to construct on expectation a single representation of on level . The constraints in Algorithm 3.1 will eliminate representations from level on.
Let us start on level , and let be an arbitrary level- representation of . Then , which implies . Therefore, the value of is fully determined by fixing the seven values ,. In Algorithm 3.1, choose random and enforce . In fact, one constraint eliminates (on expectation) a -fraction of one level- lists. Since all level- representations only need to fulfill seven out of eight constraints, we eliminate only a -fraction of all representations. That is, we expect that representations pass the level- constraints.
For the level- join we conclude similarly by imposing four additional constraints , on bits. We define the constraint is consistent with the level- constraints if . Choose random consistent constraint , and setting . Every level- representation that satisfies constraint automatically satisfies constraint . So, these constraints eliminate another -fraction of the level- representations. Note that here we use the consistency of constraint. Similarly, the level- constraints eliminate another -fraction of the level- representations. Together with the level- and level- elimination, we expect to have representations left in level . Thus, we need
In the analysis we only guarantee that in a single run (or at most polynomially many runs) of the EM algorithm the expected number of returned representations of the solution is at least one.
Heuristic(). We heuristically assume that the random variable that counts the number of representations per run of the algorithm is sharply centered around its expectation to conclude that a single run (or at most polynomially many runs) suffices to find a solution with good probability.
Note that the heuristic must fail, if the EM algorithm clusters its representations around certain constraints. To prevent representations from clustering we need to ensure that every level- and level- representation is constructed at most once. Thus, we need
Remake. The following are some supplements of the correctness of .
(1). The correctness of is based on the randomness of random subset sum instances. Consider a random subset sum instance whose elements are all equal to . It is clear that unless all the random constraints in Algorithm 3.1 are chosen equal to then the algorithm cannot succeed. Thus, in this case the probability of success is very low. There are many other bad subset sum instances. However, for a random subset sum instance, the expected probability of success is not too small[5, 14].
(2). The reasonableness of Heuristic is supported by experimental data. The experimental results predict that the size of the lists are always very close to the theoretical values at the level- and smaller at the other levels.
Time and Memory Complexity
Let us start with analyzing the run time for sampling the level- lists in Algorithm 3.1. Note that we stop sampling, when we have found different list elements. Since we sample iid elements from , we conclude by Lemma 1 that this takes only many iterations.
Let us now turn to the computation of the level- to level- lists. Let be an arbitrary list on level . is constructed in a -tree list join manner. Ignoring logarithmic factors, this list join process works in time linear in the two input lists and the output list.
Let for be the expected list size of level- lists (before filtering) . Let for denote the expected size of filtered level- lists. Under Heuristic the total expected run time of Algorithm 3.1 is