 # Quantum Algorithms for the Most Frequently String Search, Intersection of Two String Sequences and Sorting of Strings Problems

We study algorithms for solving three problems on strings. The first one is the Most Frequently String Search Problem. The problem is the following. Assume that we have a sequence of n strings of length k. The problem is finding the string that occurs in the sequence most often. We propose a quantum algorithm that has a query complexity Õ(n √(k)). This algorithm shows speed-up comparing with the deterministic algorithm that requires Ω(nk) queries. The second one is searching intersection of two sequences of strings. All strings have the same length k. The size of the first set is n and the size of the second set is m. We propose a quantum algorithm that has a query complexity Õ((n+m) √(k)). This algorithm shows speed-up comparing with the deterministic algorithm that requires Ω((n+m)k) queries. The third problem is sorting of n strings of length k. On the one hand, it is known that quantum algorithms cannot sort objects asymptotically faster than classical ones. On the other hand, we focus on sorting strings that are not arbitrary objects. We propose a quantum algorithm that has a query complexity O(n (log n)^2 √(k)). This algorithm shows speed-up comparing with the deterministic algorithm (radix sort) that requires Ω((n+d)k) queries, where d is a size of the alphabet.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Quantum computing [NC10, AMB17] is one of the hot topics in computer science of last decades. There are many problems where quantum algorithms outperform the best known classical algorithms [DE 01, 21, KS19, KKS19].

One of these problems are problems for strings. Researchers show the power of quantum algorithms for such problems in [MON17, BBB+97, RV03].

In this paper, we consider three problems:

• the Most Frequently String Search problem;

• Strings sorting problem;

• Intersection of Two String Sequences problem.

Our algorithms use some quantum algorithms as a subroutine, and the rest part is classical. We investigate the problems in terms of query complexity. The query model is one of the most popular in the case of quantum algorithms. Such algorithms can do a query to a black box that has access to the sequence of strings. As a running time of an algorithm, we mean a number of queries to the black box.

The first problem is the following. We have strings of length . We can assume that symbols of strings are letters from any finite alphabet, for example, binary, Latin alphabet or Unicode. The problem is finding the string that occurs in the sequence most often. The problem [CH08] is one of the most well-studied ones in the area of data streams [MUT05, AGG07, BCG11, BLM15]. Many applications in packet routing, telecommunication logging and tracking keyword queries in search machines are critically based upon such routines. The best known deterministic algorithms require queries because an algorithm should at least test all symbols of all strings. The deterministic solution can use the Trie (prefix tree) [DE 59, BLA98, BRA08, KNU73] that allows to achieve the required complexity.

We propose a quantum algorithm that uses a self-balancing binary search tree for storing strings and a quantum algorithm for comparing strings. As a self-balancing binary search tree we can use the AVL tree [AL62, CLR+01] or the Red-Black tree [GS78, CLR+01]. As a string comparing algorithm, we propose an algorithm that is based on the first one search problem algorithm from [KOT14, LL15, LL16]. This algorithm is a modification of Grover’s search algorithm [GRO96, BBH+98]. Another important algorithm for search is described in [LON01]. Our algorithm for the most frequently string search problem has query complexity , where does not consider a log factors. If , then our algorithm is better than deterministic one. Note, that this setup makes sense in practical cases.

The second problem is String Sorting problem. Assume that we have strings of length . It is known [HNS01, HNS02] that no quantum algorithm can sort arbitrary comparable objects faster than . At the same time, several researchers tried to improve the hidden constant [OEA+13, OA16]. Other researchers investigated space bounded case [KLA03]. We focus on sorting strings. In a classical case, we can use an algorithm that is better than arbitrary comparable objects sorting algorithms. It is radix sort that has query complexity [CLR+01], where is a size of the alphabet. Our quantum algorithm for the string sorting problem has query complexity . It is based on standard sorting algorithms like Merge sort [CLR+01] or Heapsort [WIL64, CLR+01] and the quantum algorithm for comparing strings.

The third problem is the Intersection of Two String Sequences problem. Assume that we have two sequences of strings of length . The size of the first set is and the size of the second one is . The first sequence is given and the second one is given in online fashion, one by one. After each requested string from the second sequence, we want to check weather this string belongs to the first sequence. We propose two quantum algorithms for the problem. Both algorithms has query complexity . The first algorithm uses a self-balancing binary search tree like the solution of the first problem. The second algorithm uses a quantum algorithm for sorting strings and has better big- hidden constant. At the same time, the best known deterministic algorithm requires queries.

The structure of the paper is the following. We present the quantum subroutine that compares two strings in Section 2. Then we discussed three problems: the Most Frequently String Search problem in Section 3, Strings Sorting problem in Section 4 and Intersection of Two String Sequences problem in Section 5.

## 2 The Quantum Algorithm for Two Strings Comparing

Firstly, we discuss a quantum subroutine that compares two strings of length . Assume that this subroutine is and it compares and in lexicographical order. It returns:

• if ;

• if ;

• if ;

As a base for our algorithm, we will use the algorithm of finding the minimal argument with -result of a Boolean-value function. Formally, we have:

###### Lemma 1

[KOT14, LL15, LL16] Suppose, we have a function for some integer . There is a quantum algorithm for finding . The algorithm finds with expected query complexity

and error probability that is at most

.

Let us choose the function . So, we search that is the index of the first unequal symbol of the strings. Then, we can claim that precedes in lexicographical order iff precedes in alphabet . The claim is right by the definition of lexicographical order. If there are no unequal symbols, then the strings are equal.

We use the standard technique of boosting success probability. So, we repeat the algorithm times and return the minimal answer, where is a number of strings in the sequence . In that case, the error probability is , because if we have an error in whole algorithm it means no invocation finds minimal index of unequal symbol.

Let us present the algorithm. We use as a subroutine from Lemma 1, where . Assume that this subroutine returns if it does not find any solution.

The next property follows from the previous discussion.

###### Lemma 2

Algorithm 1 compares two strings of length in lexicographical order with query complexity and error probability .

## 3 The Most Frequently String Search Problem

Let us formally present the problem.

Problem. For some positive integers and , we have the sequence of strings . Each for some finite size alphabet . Let be a number of occurrences of string . We search .

### 3.1 The Quantum algorithm

Firstly, we present an idea of the algorithm.

We use the well-known data structure a self-balancing binary search tree. As an implementation of the data structure, we can use the AVL tree [AL62, CLR+01] or the Red-Black tree [GS78, CLR+01]. Both data structures allow as to find and add elements in running time, where is a size of the tree.

The idea of the algorithm is the following. We store pairs in vertexes of the tree, where is an index of a string from and is a number of occurrences of the string . We assume that a pair is less than a pair iff precedes in the lexicographical order. So, we use subroutine as the compactor of the vertexes. The tree represents a set of unique strings from with a number of occurrences.

We consider all strings from to and check the existence of a string in our tree. If a string exists, then we increase the number of occurrences. If the string does not exist in the tree, then we add it. At the same time, we store and recalculate it in each step.

Let us present the algorithm formally. Let be a self-balancing binary search tree such that:

• finds vertex or returns if such vertex does not exist;

• adds vertex to the tree and returns the vertex as a result;

• initializes an empty tree;

Let us discuss the property of the algorithm.

###### Theorem 3.1

Algorithm 2 finds the most frequently string from with query complexity and error probability .

###### Proof

The correctness of the algorithm follows from the description. Let us discuss the query complexity. Each operation and requires comparing operations . These operations are invoked times. Therefore, we have comparing operations. Due to Lemma 2, each comparing operation requires queries. The total query complexity is .

Let us discuss the error probability. Events of error in the algorithm are independent. So, all events should be correct. Due to Lemma 2, the probability of correctness of one event is . Hence, the probability of correctness of all events is at least for some constant .

Note that

 limn→∞1−(1−1n3)α⋅nlogn1/n<1;

Hence, the total error probability is at most .

The data structure that we use can be considered as a separated data structure. We call it “Multi-set of strings with quantum comparator”. Using this data structure, we can implement

• “Set of strings with quantum comparator” if always in pair of a vertex;

• “Map with string key and quantum comparator” if we replace by any data for any set . In that case, the data structure implements mapping .

All of these data structures has complexity of basic operations (Find, Add, Delete).

### 3.2 On the Classical Complexity of the Problem

The best known classical algorithm stores string to Trie (prefix tree) [DE 59, BLA98], [BRA08, KNU73] and do the similar operations. The running time of such algorithm is . At the same time, we can show that if an algorithm tests variables, then it can return a wrong answer.

###### Theorem 3.2

Any deterministic algorithm for the Most Frequently String Search problem has query complexity.

###### Proof

Suppose, we have a deterministic algorithm for the Most Frequently String Search problem that uses queries.

Let us consider an adversary that suggest an input. The adversary wants to construct an input such that the algorithm obtains a wrong answer.

Without loss of generality, we can say that is even. Suppose, and are different symbols from an input alphabet. If the algorithm requests an variable for , then the adversary returns . If the algorithm requests an variable for , then the adversary returns .

Because of the algorithm uses queries, there are at least one and one that are not requested, where , and .

Let be a string such that for all . Let be a string such that for all .

Assume that returns . Then, the adversary assigns and assigns for each . Therefore, the right answer should be .

Assume that returns a string . Then, the adversary assigns and assigns for each . Therefore, the right answer should be .

So, the adversary can construct the input such that obtains a wrong answer.

## 4 Strings Sorting Problem

Let us consider the following problem.

Problem. For some positive integers and , we have the sequence of strings . Each for some finite size alphabet . We search order such that for any we have in lexicographical order.

We use Heap sort algorithm [WIL64, CLR+01] as a base and Quantum algorithm for comparing string from Section 2. We can replace Heap sort algorithm by any other sorting algorithm, for example, Merge sort [CLR+01]. In a case of Merge sort, the big-O hidden constant in query complexity will be smaller. At the same time, we need more additional memory.

Let us present Heap sort for completeness of the explanation. We can use Binary Heap [WIL64]. We store indexes of strings in vertexes. As in the previous section, if we compare vertexes and with corresponding indexes and , then iff in lexicographical order. We use for comparing strings. Binary Heap has three operations:

• returns minimal and removes it from the data structure.

• adds vertex with value to the heap;

• initializes an empty heap;

The operations Get_min_and_delete and Add invoke Compare_strings subroutine times, where is the size of the heap.

The algorithm is the following.

If we implement the sequence as an array, then we can store the heap in the same array. In this case, we do not need additional memory.

We have the following property of the algorithm that can be proven by the same way as Theorem 3.1.

###### Theorem 4.1

Algorithm 3 sorts with query complexity and error probability .

The lower bound for deterministic complexity can be proven by the same way as in Theorem 3.2.

###### Theorem 4.2

Any deterministic algorithm for Sorting problem has query complexity.

The Radix sort [CLR+01] algorithm almost reaches this bound and has complexity.

## 5 Intersection of Two Sequences of Strings Problem

Let us consider the following problem.

Problem. For some positive integers and , we have the sequence of strings . Each for some finite size alphabet . Then, we get requests , where . The answer to a request is iff there is such that . We should answer or to each of requests.

We have two algorithms. The first one is based on “Set of strings with quantum comparator” data structure from Section 3. We store all strings from to a self-balancing binary search tree . Then, we answer each request using operation. Let us present the Algorithm 4.

The second algorithm is based on Sorting algorithm from Section 4. We sort strings from . Then, we answer to each request using binary search in the sorted sequence of strings [CLR+01] and Compare_strings subroutine for comparing strings during the binary search. Let us present the Algorithm 5. Assume that the sorting Algorithm 3 is the subroutine and it returns the order . The binary search algorithm with Compare_strings subroutine as comparator is subroutine and it searches in the ordered sequence . Suppose that the subroutine Binary_search_for_strings returns if it finds and otherwise.

The algorithms have the following query complexity.

###### Theorem 5.1

Algorithm 4 and Algorithm 5 solve Intersection of Two Sequences of Strings Problem with query complexity and error probability .

###### Proof

The correctness of the algorithms follows from the description. Let us discuss the query complexity of the first algorithm. As in the proof of Theorem 3.1, we can show that constructing of the search tree requires comparing operations. Then, the searching of all strings requires comparing operations. The total number of comparing operations is . We will use little bit modified version of the Algorithm 1 where we run it times. We can prove that comparing operation requires queries. The proof is similar to the proof of corresponding claim from the proof of Lemma 2. So, the total complexity is .

The second algorithm also has the same complexity because it uses comparing operations for sorting and comparing operations for all invocations of the binary search algorithm.

Let us discuss the error probability. Events of error in the algorithm are independent. So, all events should be correct. We can prove that the error probability for comparing operation is . The proof is like the proof of Lemma 2. So, the probability of correctness of one event is . Hence, the probability of correctness of all events is at least for some constant .

Note that

 limn→∞1−(1−1(n+m)3)α⋅(n+m)logn1/(n+m)<1;

Hence, the total error probability is at most .

Note that Algorithm 5 has a better big- hidden constant than Algorithm 4, because the Red-Black tree or AVL tree has a height that greats constant times. So, adding elements to the tree and checking existence has bigger big- hidden constant than sorting and binary search algorithms.

The lower bound for deterministic complexity can be proven by the same way as in Theorem 3.2.

###### Theorem 5.2

Any deterministic algorithm for Intersection of Two Sequences of Strings Problem has query complexity.

This complexity can be reached if we implement the set of strings using Trie (prefix tree) [DE 59, BLA98, BRA08, KNU73].

Note, that we can use the quantum algorithm for element distinctness [AMB07], [AMB04] for this problem. The algorithm solves a problem of finding two identical elements in the sequence. The query complexity of the algorithm is , where is a number of elements in the sequence. The complexity is tight because of [AS04]. The algorithm can be the following. On -th request, we can add the string to the sequence and invoke the element distinctness algorithm that finds a collision of with other strings. Such approach requires query for each request and for processing all requests. Note, that the streaming nature of requests does not allow us to access to all by Oracle. So, each request should be processed separately.

## 6 Conclusion

In the paper we propose a quantum algorithm for comparing strings. Using this algorithm we discussed four data structures: “Multi-set of strings with quantum comparator”, “Set of strings with quantum comparator”, “Map with a string key and quantum comparator” and “Binary Heap of strings with quantum comparator”. We show that the first two data structures work faster than the implementation of similar data structures using Trie (prefix tree) in a case of . The trie implementation is the best known classical implementation in terms of complexity of simple operations (add, delete or find). Additionally, we constructed a quantum strings sort algorithm that works faster than the radix sort algorithm that is the best known deterministic algorithm for sorting a sequence of strings.

Using these two groups of results, we propose quantum algorithms for two problems: the Most Frequently String Search and Intersection of Two String Sets. These quantum algorithms are more efficient than deterministic ones.

### Acknowledgement

This work was supported by Russian Science Foundation Grant 19-71-00149. We thank Aliya Khadieva, Farid Ablayev and Kazan Federal University quantum group for useful discussions.

## References

• [AS04] S. Aaronson and Y. Shi (2004) Quantum lower bounds for the collision and the element distinctness problems. Journal of the ACM (JACM) 51 (4), pp. 595–605. Cited by: §5.
• [AL62] G. M. Adel’son-Vel’skii and E. M. Landis (1962) An algorithm for organization of information. In Doklady Akademii Nauk, Vol. 146, pp. 263–266. Cited by: §1, §3.1.
• [AGG07] C. C. Aggarwal (2007) Data streams: models and algorithms. Vol. 31, Springer Science & Business Media. Cited by: §1.
• [AMB17] A. Ambainis (2017) Understanding quantum algorithms via query complexity. arXiv:1712.06349. Cited by: §1.
• [AMB04] A. Ambainis (2004) Quantum walk algorithm for element distinctness. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’04, pp. 22–31. Cited by: §5.
• [AMB07] A. Ambainis (2007) Quantum walk algorithm for element distinctness. SIAM Journal on Computing 37 (1), pp. 210–239. Cited by: §5.
• [BCG11] L. Becchetti, I. Chatzigiannakis, and Y. Giannakopoulos (2011) Streaming techniques and data aggregation in networks of tiny artefacts. Computer Science Review 5 (1), pp. 27 – 46. Cited by: §1.
• [BBB+97] C. H. Bennett, E. Bernstein, G. Brassard, and U. Vazirani (1997) Strengths and weaknesses of quantum computing. SIAM journal on Computing 26 (5), pp. 1510–1523. Cited by: §1.
• [BLA98] P. E. Black (1998) Dictionary of algorithms and data structures— nist. Technical report Cited by: §1, §3.2, §5.
• [BLM15] J. Boyar, K. S. Larsen, and A. Maiti (2015) The frequent items problem in online streaming under various performance measures. International Journal of Foundations of Computer Science 26 (4), pp. 413–439. Cited by: §1.
• [BBH+98] M. Boyer, G. Brassard, P. Høyer, and A. Tapp (1998) Tight bounds on quantum searching. Fortschritte der Physik 46 (4-5), pp. 493–505. Cited by: §1.
• [BRA08] P. Brass (2008) Advanced data structures. Vol. 193, Cambridge University Press Cambridge. Cited by: §1, §3.2, §5.
• [CLR+01] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein (2001) Introduction to algorithms. McGraw-Hill. Cited by: §1, §1, §3.1, §4, §4, §5.
• [CH08] G. Cormode and M. Hadjieleftheriou (2008) Finding frequent items in data streams. Proceedings of the VLDB Endowment 1 (2), pp. 1530–1541. Cited by: §1.
• [DE 59] R. De La Briandais (1959) File searching using variable length keys. In Papers presented at the the March 3-5, 1959, western joint computer conference, pp. 295–298. Cited by: §1, §3.2, §5.
• [DE 01] R. De Wolf (2001) Quantum computing and communication complexity. Cited by: §1.
• [GRO96] L. K. Grover (1996) A fast quantum mechanical algorithm for database search. In

Proceedings of the twenty-eighth annual ACM symposium on Theory of computing

,
pp. 212–219. Cited by: §1.
• [GS78] L. J. Guibas and R. Sedgewick (1978) A dichromatic framework for balanced trees. In Proceedings of SFCS 1978, pp. 8–21. Cited by: §1, §3.1.
• [HNS01] P. Høyer, J. Neerbek, and Y. Shi (2001) Quantum complexities of ordered searching, sorting, and element distinctness. In International Colloquium on Automata, Languages, and Programming, pp. 346–357. Cited by: §1.
• [HNS02] P. Høyer, J. Neerbek, and Y. Shi (2002) Quantum complexities of ordered searching, sorting, and element distinctness. Algorithmica 34 (4), pp. 429–448. Cited by: §1.
•  S. Jordan Bounded error quantum algorithms zoo. Cited by: §1.
• [KKS19] K. Khadiev, D. Kravchenko, and D. Serov (2019) On the quantum and classical complexity of solving subtraction games. In Proceedings of CSR 2019, LNCS, Vol. 11532, pp. 228–236. Cited by: §1.
• [KS19] K. Khadiev and L. Safina (2019) Quantum algorithm for dynamic programming approach for dags. applications for zhegalkin polynomial evaluation and some problems on dags. In Proceedings of UCNC 2019Unconventional Computation and Natural Computation, LNCS, Vol. 4362, pp. 150–163. Cited by: §1.
• [KLA03] H. Klauck (2003) Quantum time-space tradeoffs for sorting. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pp. 69–76. Cited by: §1.
• [KNU73] D. Knuth (1973) Searching and sorting, the art of computer programming, vol. 3. Reading, MA, Addison-Wesley. Cited by: §1, §3.2, §5.
• [KOT14] R. Kothari (2014) An optimal quantum algorithm for the oracle identification problem. In 31st International Symposium on Theoretical Aspects of Computer Science, pp. 482. Cited by: §1, Lemma 1.
• [LL15] C. Y.-Y. Lin and H.-H. Lin (2015) Upper bounds on quantum query complexity inspired by the elitzur-vaidman bomb tester. In 30th Conference on Computational Complexity (CCC 2015), Cited by: §1, Lemma 1.
• [LL16] C. Y.-Y. Lin and H.-H. Lin (2016) Upper bounds on quantum query complexity inspired by the elitzur–vaidman bomb tester. Theory of Computing 12 (18), pp. 1–35. Cited by: §1, Lemma 1.
• [LON01] G. Long (2001) Grover algorithm with zero theoretical failure rate. Physical Review A 64 (2), pp. 022307. Cited by: §1.
• [MON17] A. Montanaro (2017)

Quantum pattern matching fast on average

.
Algorithmica 77 (1), pp. 16–39. Cited by: §1.
• [MUT05] S. Muthukrishnan (2005) Data streams: algorithms and applications. Foundations and Trends in Theoretical Computer Science 1 (2), pp. 117–236. Cited by: §1.
• [NC10] M. A. Nielsen and I. L. Chuang (2010) Quantum computation and quantum information. Cambridge univ. press. Cited by: §1.
• [OA16] A. Odeh and E. Abdelfattah (2016)

Quantum sort algorithm based on entanglement qubits

00, 11
.
In 2016 IEEE Long Island Systems, Applications and Technology Conference (LISAT), pp. 1–5. Cited by: §1.
• [OEA+13] A. Odeh, K. Elleithy, M. Almasri, and A. Alajlan (2013) Sorting n elements using quantum entanglement sets. In Third International Conference on Innovative Computing Technology (INTECH 2013), pp. 213–216. Cited by: §1.
• [RV03] H. Ramesh and V. Vinay (2003) String matching in quantum time. Journal of Discrete Algorithms 1 (1), pp. 103–110. Cited by: §1.
• [WIL64] J. W. J. Williams (1964-06) Algorithm 232 - heapsort. Commun. ACM 7 (6), pp. 347–349. Cited by: §1, §4, §4.