# Improved quantum algorithm for the random subset sum problem

We propose a quantum algorithm for solving random subset sum instances which play a crucial role in cryptographic constructions. In 2013, Bernstein, Jeffery, Lange and Meurer constructed a quantum subset sum algorithm with heuristic time complexity 2^0.241n, by enhancing the classical random subset sum algorithm of Howgrave-Graham and Joux with a quantum walk technique. In 2018, Helm and May improved heuristic running time and memory to 2^0.226n by quantizing the classical Becker, Coron and Joux algorithm. In our paper, we get a new quantum algorithm with running time down to O(2^0.209n) for all but a negligible fraction of random subset sum instances by combining the classical sampling and quantum walks.

## Authors

• 74 publications
• 5 publications
• ### Improved Classical and Quantum Algorithms for Subset-Sum

We present new classical and quantum algorithms for solving random subse...
02/12/2020 ∙ by Xavier Bonnetain, et al. ∙ 0

• ### A Quantum Approach to Subset-Sum and Similar Problems

In this paper, we study the subset-sum problem by using a quantum heuris...
07/27/2017 ∙ by Ammar Daskin, et al. ∙ 0

• ### Better Sample -- Random Subset Sum in 2^0.255n and its Impact on Decoding Random Linear Codes

We propose a new heuristic algorithm for solving random subset sum insta...
07/09/2019 ∙ by Andre Esser, et al. ∙ 0

• ### Improved Low-qubit Hidden Shift Algorithms

Hidden shift problems are relevant to assess the quantum security of var...
01/31/2019 ∙ by Xavier Bonnetain, et al. ∙ 0

• ### Treedy: A Heuristic for Counting and Sampling Subsets

Consider a collection of weighted subsets of a ground set N. Given a que...
09/26/2013 ∙ by Teppo Niinimaki, et al. ∙ 0

• ### Lackadaisical quantum walks on triangular and honeycomb 2D grids

In the typical model, a discrete-time coined quantum walk search has the...
07/24/2020 ∙ by Nikolajs Nahimovs, et al. ∙ 0

• ### The Quantum Version Of Classification Decision Tree Constructing Algorithm C5.0

In the paper, we focus on complexity of C5.0 algorithm for constructing ...
07/16/2019 ∙ by Kamil Khadiev, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The subset sum problem is one of the most fundamental problems in theoretical computer science, and is one of the most famous NP-hard problems[12]. Due to its difficulty, subset sum problem is often used to design cryptosystems[8, 10, 17, 21, 22]. An instance of this problem consists of . Given an instance, there exist two forms of subset sum problem. The first form is the decision subset sum problem, where we need to decide whether there exists a subset of with sum . The second form is the computational subset sum problem, where we need to find a subset of that sums to . The decision subset sum problem is NP-complete. It is well-known that given access to an oracle that solves the decision problem, the computational problem can be solved using calls to this oracle.

In 1974, Horowitz and Sahni (HS)[16] introduced a Meet-in-the-Middle algorithm with time and space complexity . In 1981, Schroeppel and Shamir (SS) [27] improved this to time complexity with only space complexity . These algorithms are still the fastest known for solving general instances of subset sum.

For random subset sum instances, Brickell [4] and Lagarias, Odlyzko [20] showed that random subset sum instances can be solved with density

, by giving an oracle solving the shortest vector problem

in lattices. The density of a random subset sum instance is defined as . In 1991 this bound was improved by Coster et al. [7] and Joux, Stern [18] to . Note that this transformation does not rule out the hardness of subset sum problem in the low-density regime, since solving is known to be NP-hard [1]. In the high-density regime with dynamic programming solves subset sum problem efficiently [13].

However, for the case only exponential time algorithms are known. In a breakthrough paper, Howgrave-Graham and Joux (HGJ) [14] at Eurocrypt 2010 showed that random subset sum instances can be solved in time . At Eurocrypt 2011, Becker, Coron and Joux (BCJ) [5] proposed a modification to the HGJ algorithm with heuristic run time . In 2019, Esser and May (EM) [9] proposed a new heuristic algorithm based on representation and sampling technique with run time

. Sampling technique introduces variance that increases the amount of representations and brings more optimization flexibility. A remarkable property is that the complexity of the EM algorithm improves with increasing search tree depth.

Quantum complexity of subset sum.

In 2013, Bernstein, Jeffery, Lange and Meurer [6] constructed quantum subset sum algorithms, inspired by the HS algorithm, the SS algorithm and the HGJ algorithm. In detail, Bernstein et al. showed that the quantum HS algorithm achieve run time . Moreover, a first quantum version of the SS algorithm with Grover search [11] runs in time using only space . A second quantum version of the SS algorithm using quantum walks [3, 2] achieves time . Eventually, Bernstein et al. used the quantum walk framework of Magniez et al. [23] to achieve a quantum version of the HGJ algorithm with time and space complexity . In 2018, Helm and May [15] achieve a quantum version of the BCJ algorithm with time and space complexity , which is the best known quantum random subset sum algorithm.

Our result.

The main contribution of our paper is to combine the classical sampling and quantum walks to get a new quantum algorithm with running time down to . In the previous quantum walk algorithms, e.g. quantum HGJ and quantum BCJ algorithms, the key point is that we no longer enumerate the initial lists, but only start with random subsets of the initial lists some fixed size that has to be optimized. Although these quantum algorithms do not enumerate the initial lists, we know whether a given element belongs to the initial lists. However, the EM algorithm builds the initial lists by sampling. We do not know whether a given element belongs to the initial lists in the EM algorithm. Therefore, quantum walks cannot be used directly. One simple way to solve this problem is firstly sampling to give us the initial lists. Next, carry out quantum walks. Moreover, we need to define an appropriate quantum walk for the EM algorithm within the framework of Magniez et al. [23].

The paper is organized as follows. In Section 2 we introduce the random subset sum problem and some notations. In Section 3 we firstly describe the EM algorithm with search tree depth , whose quantum version is optimal for the depth. Then we describe the EM algorithm with arbitrary depth. In Section 4, we convert the random subset sum problem to graph search problem, and we analyze the cost of a random walk on the search space defined by the EM algorithm. Then we define an appropriate data structure and give our quantum algorithm. Finally, we conclude in Sections 5.

Notation.

The following are standard notations in computational complexity. Let be positive valued functions in . Then

if there exist constants and such that for all .

if there exist constants and such that for all .

if both and .

differ from respectively only by a logarithmic factor.

Throughout the paper, we denote by the Hamming weight of a vector . We use the same notation for denoting the cardinality of a set. And we denote by the logarithmic function with base .

## 2 Preliminaries

[Random Subset Sum] Let be chosen uniformly at random. For a random with , let . Then is called a random subset sum instance, while each with is called a solution.

By we refer to the binary entropy function, which is defined on input as , where we use the convention . We approximate binomial coefficients by the entropy function, derived from Stirling s formula .

Let

be a discrete random variable following the distribution

, which is defined on a finite alphabet . For let . We define the entropy of a random variable or equivalently its distribution as

 H(X)=H(D):=−∑x∈ΛpX(x)logpX(x).

For we refer by

to the Bernoulli distribution with parameter

, that is for we have and . The sum of iid

-distributed random variables is binomially distributed with parameters

and , which we denote by .

The following lemma guarantees that we do not obtain too many duplicate vectors when sampling a limited amount of with iid .

[Esser, May[9]] Assume that we sample vectors with iid

. With overwhelming probability we obtain a set of

many different vectors by sampling many vectors.

## 3 The EM Classical Algorithm

The Esser-May (EM) algorithm is search tree-based and solves the instances in a divide-and-conquer method using the representation method. Under an appropriate assumption, the EM algorithm can implement for a search tree with arbitrary depth. We denote by the EM algorithm with search tree depth . Whereas run time of the EM algorithm decreases with increasing search tree depth , our quantum algorithm is optimal at the search tree depth . Thus, in this section, we firstly describe . Then we generalize to .

### 3.1 Em(4)

Let be a subset sum instance with a solution with . That is, .

Before describing , we recall a classical list join operator that we extensively use. The join operator performs the following task: given two lists of numbers and of respective sizes and , together with two integers and , the algorithm computes the list such that: . The list can be constructed as follows. Sort and then for every we find via binary search all elements such that . The complexity of this method is [28]. Moreover, assuming that the values of the initial lists modulo are randomly distributed, the expected of is .

The basic idea of representation is to represent the solution as a sum where and . If we can effectively (respectively ) construct two lists and , then using the list join operator above, we can effectively get the solution via the joined list . Note that sorting and searching are performed with respect to and , where . Generally speaking, this effectiveness of constructing base lists is not guaranteed. Thus we decompose the original random subset sum problem into two subproblems several times to improve the complexity of the EM algorithm. That is, represent the solution as a sum for some .

In , we represent the solution as a sum with , where , . We call the representation of the form as the level- representation for each . The tree structure of is shown in Figure 3.1. Denote by the -st list of the level , where , . Denote by the elements of , where , . In other words, are candidates for , where , . Define the join operator for as , where , .

Consider the level- representation . By linearity of the inner product any level- representation satisfies Obviously, this equation also holds modulo for any . In Figure 3.1 we construct on level of our search tree in list candidates , where . By the randomness of , the inner products for some randomly distributed modulo and thus also modulo . Thus, if we fix a certain constraint , then we expect that a -fraction of all level- representations satisfies . This enables us to filter out elements of the search space, as well as representations via constraints. Similarly, we construct on level of our search tree in list candidates , where . Eventually, construct on level in list candidates .

Figure 3.1  Tree structure of . The portion covered by the slash represents the result of the join operator.

Our goal is to construct on expectation a single representation of on Level . To this end, we initially construct the the level lists , , where . For each j, we sample iid vectors , and , where . Then, we construct the level lists , where . In order that the join operator can be successfully executed, choose random , and let . By the definition of the join operator, on level we get only those candidates satisfying for some . Note that all level- candidates are vectors from .

Similarly, we construct the level lists , where . Note that be chosen randomly on satisfying and . Then, we construct the level lists , where . Note that be chosen randomly on satisfying , and . Finally, we construct by setting . If satisfying , then is a solution of the origial random subset sum instance.

Note that any non-binary cannot be part of a valid representation of , and may safely be filtered out. Therefore, after constructing each , we immediately eliminate all non-binary vectors. In fact, the join operator already includes operations for filtering out non-binary vectors.

In our construction we tune the parameters such that we expect to obtain a representation of our solution in . Hence, a linear pass through all elements of yields a subset sum solution (on expectation). A pseudocode description of the algorithm is given by Algorithm 3.1.

Algorithm 1 The Algorithm

Input:subset sum instance Output:solution with or Parameters:and
1:  Sample all level- lists for : . for to do repeat until . repeat until .
2:  Compute Joins: (2.1) Compute level lists: Choose random . Compute . for to do compute . (2.2) Compute level lists: Choose random satisfying . Choose random satisfying . Choose random satisfying . Compute . for to do compute . (2.3) Compute level lists: Choose random satisfying . Compute . for to do compute . (2.4) Compute level lists: compute .
3:  If then return else return .

The Number of Representations

Let be a subset sum solution. Denote by the expected number of level- representations of . Denote by the expected number of level- representations of . And denote by the expected number of level- representations of . Formally, define

 R14:=E[|R14|]=E[|{(e(1)1,…,e(1)8)∈L(1)1×⋯×L(1)8|e=8∑j=1e(1)j}|]
 R24:=E[|R24|]=E[|{(e(2)1,…,e(2)4)∈L(2)1×⋯×L(2)4|e=4∑j=1e(2)j}|]
 R34:=E[|R34|]=E[|{(e(3)1,e(3)2)∈L(3)1×L(3)2|e=e(3)1+e(3)2}|]

For a fixed

 R12:=E[|{(e(1)1,…,e(1)8)∈R14|4⋀j=1e(2)j=e(1)2j−1+e(1)2j}|]

For a fixed

 R13:=E[|{(e(1)1,…,e(1)8)∈R14|e(3)1=e(1)1+e(1)2+e(1)3+e(1)4∧e(3)2=e(1)5+e(1)6+e(1)7+e(1)8}|]

Obviously, and . We define . Let us compute the values and their corresponding

. Note that, we ignore for a moment the fact that we put constraints

in Algorithm 3.1 to eliminate representations from level on. Hence the numbers count the total number of representations, without any eliminations.

The elements in the base lists are sampled from respectively . As a consequence, the elements of the level- lists are from . Let , where . Then for each coordinate of we have and . Hence a candidate is a representation of the -weight solution with probability

 p:=Pr[xi=0]n2Pr[xi=1]n2=(8α(1−α)15)n2.

By construction, Algorithm 3.1 computes every level- list out of elements. Thus, we expect , and .

Let be an arbitrary combination of level- elements. As before, is a representation of with probability . Observe that . By construction, . But . This implies that list , without any constraints, contains at least (not necessary different) vectors. By heuristically treating level- list elements as independently sampled from , an application of Lemma 2 yields that contains different vectors. Without loss of generality, we may assume that the size of is upper bounded by . Therefore, we expect , and . Similarly, we expect , and .

So, and .

Correctness of

Let be a solution to our subset sum problem. As already mentioned, our goal is to construct on expectation a single representation of on level . The constraints in Algorithm 3.1 will eliminate representations from level on.

Let us start on level , and let be an arbitrary level- representation of . Then , which implies . Therefore, the value of is fully determined by fixing the seven values ,. In Algorithm 3.1, choose random and enforce . In fact, one constraint eliminates (on expectation) a -fraction of one level- lists. Since all level- representations only need to fulfill seven out of eight constraints, we eliminate only a -fraction of all representations. That is, we expect that representations pass the level- constraints.

For the level- join we conclude similarly by imposing four additional constraints , on bits. We define the constraint is consistent with the level- constraints if . Choose random consistent constraint , and setting . Every level- representation that satisfies constraint automatically satisfies constraint . So, these constraints eliminate another -fraction of the level- representations. Note that here we use the consistency of constraint. Similarly, the level- constraints eliminate another -fraction of the level- representations. Together with the level- and level- elimination, we expect to have representations left in level . Thus, we need

 7l1+3l2+l3≤r14. (1)

In the analysis we only guarantee that in a single run (or at most polynomially many runs) of the EM algorithm the expected number of returned representations of the solution is at least one.

Heuristic([9]). We heuristically assume that the random variable that counts the number of representations per run of the algorithm is sharply centered around its expectation to conclude that a single run (or at most polynomially many runs) suffices to find a solution with good probability.

This treatment is similar to Wagner s original -tree algorithm [28] and its applications [5, 14].

Note that the heuristic must fail, if the EM algorithm clusters its representations around certain constraints. To prevent representations from clustering we need to ensure that every level- and level- representation is constructed at most once. Thus, we need

 4l1≥r12, (2)

and

 2l2≥r13. (3)

Remake. The following are some supplements of the correctness of .

(1). The correctness of is based on the randomness of random subset sum instances. Consider a random subset sum instance whose elements are all equal to . It is clear that unless all the random constraints in Algorithm 3.1 are chosen equal to then the algorithm cannot succeed. Thus, in this case the probability of success is very low. There are many other bad subset sum instances. However, for a random subset sum instance, the expected probability of success is not too small[5, 14].

(2). The reasonableness of Heuristic is supported by experimental data. The experimental results predict that the size of the lists are always very close to the theoretical values at the level- and smaller at the other levels[5].

Time and Memory Complexity

Let us start with analyzing the run time for sampling the level- lists in Algorithm 3.1. Note that we stop sampling, when we have found different list elements. Since we sample iid elements from , we conclude by Lemma 1 that this takes only many iterations.

Let us now turn to the computation of the level- to level- lists. Let be an arbitrary list on level . is constructed in a -tree list join manner. Ignoring logarithmic factors, this list join process works in time linear in the two input lists and the output list.

Let for be the expected list size of level- lists (before filtering) . Let for denote the expected size of filtered level- lists. Under Heuristic the total expected run time of Algorithm 3.1 is

 T=max(L(0),L(1),L(2),L