# Longest Increasing Subsequence under Persistent Comparison Errors

We study the problem of computing a longest increasing subsequence in a sequence S of n distinct elements in the presence of persistent comparison errors. In this model, every comparison between two elements can return the wrong result with some fixed (small) probability p , and comparisons cannot be repeated. Computing the longest increasing subsequence exactly is impossible in this model, therefore, the objective is to identify a subsequence that (i) is indeed increasing and (ii) has a length that approximates the length of the longest increasing subsequence. We present asymptotically tight upper and lower bounds on both the approximation factor and the running time. In particular, we present an algorithm that computes an O( n)-approximation in time O(n n), with high probability. This approximation relies on the fact that that we can approximately sort n elements in O(n n) time such that the maximum dislocation of an element is at most O( n). For the lower bounds, we prove that (i) there is a set of sequences, such that on a sequence picked randomly from this set every algorithm must return an Ω( n)-approximation with high probability, and (ii) any O( n)-approximation algorithm for longest increasing subsequence requires Ω(n n) comparisons, even in the absence of errors.

## Authors

• 4 publications
• ### Optimal Sorting with Persistent Comparison Errors

We consider the problem of sorting n elements in the case of persistent ...
04/20/2018 ∙ by Barbara Geissmann, et al. ∙ 0

• ### A Nearly Optimal Algorithm for Approximate Minimum Selection with Unreliable Comparisons

We consider the approximate minimum selection problem in presence of ind...
05/05/2018 ∙ by Stefano Leucci, et al. ∙ 0

• ### Improved Dynamic Algorithms for Longest Increasing Subsequence

We study dynamic algorithms for the longest increasing subsequence (LIS)...
11/21/2020 ∙ by Tomasz Kociumaka, et al. ∙ 0

• ### Distributed Data Summarization in Well-Connected Networks

We study distributed algorithms for some fundamental problems in data su...
08/01/2019 ∙ by Hsin-Hao Su, et al. ∙ 0

• ### Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation

Dynamic time warping distance (DTW) is a widely used distance measure be...
04/22/2019 ∙ by William Kuszmaul, et al. ∙ 0

• ### Approximating LCS and Alignment Distance over Multiple Sequences

We study the problem of aligning multiple sequences with the goal of fin...
10/24/2021 ∙ by Debarati Das, et al. ∙ 0

• ### Fast and Longest Rollercoasters

For k≥ 3, a k-rollercoaster is a sequence of numbers whose every maximal...
10/17/2018 ∙ by Paweł Gawrychowski, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

When dealing with complex systems and large volumes of information, it is often the case that at least part of the involved data will be inconsistent. These inconsistencies can be intrinsic, i.e., they might shed from the fact that the data is obtained from an inherently noisy source (this is typically the case in human-produced data), or they might be the result of corruptions caused by random errors (think, for instance, of random memory faults or communication errors). It is therefore important to understand how the classical techniques used to solve basic algorithmic problems can cope with such errors.

In this paper, we consider the problem of computing a longest increasing subsequence in a given sequence

of distinct elements –a fundamental task that appears naturally in many areas, such as in probability theory and combinatorics

[2, 4], scheduling [3, 18], and computational biology [9, 20]– in presence of random persistent comparison errors.

In this model, every comparison between two elements is wrong with some small fixed probability , and correct with probability . The comparison results are independent over all pairs of elements, and comparisons cannot be repeated. Note that this is equivalent to say that repeating the same comparison multiple times yields each time the same result. Hence, comparison results are persistent: always wrong or always correct. Furthermore, we assume that we cannot inspect the values of the elements, but only use such element comparisons. Because of these comparison errors, it is impossible to compute correctly, instead, we seek to return a sequence that (i) is indeed increasing and that (ii) has some guaranteed minimum length depending on the length of the longest increasing sequence . In particular, we are interested in algorithms that return an increasing sequence of length at least , where is the approximation factor.

This error model has been first employed by Braverman and Mossel [6], who studied the problem of sorting. Other work on sorting followed (see [12, 13, 16]

) and the model has been studied also for finding the minimum, searching, and linear programming in two dimensions

[16]. In this paper, we will present an algorithm that returns an -approximation on the longest increasing subsequence in time, with high probability. Moreover, we will prove that this approximation factor is the best possible as is also a lower bound, regardless of the running time, and that any -approximation algorithm requires comparisons, even in the absence of comparison errors.

### 1.1 Related Work

There are several algorithms to compute a longest increasing subsequence of a sequence , if no comparison errors happen. Typically, they are based on a common underlying algorithmic idea: They process the elements one by one and maintain for each length found so far the increasing subsequence of this length that ends with the smallest possible element seen so far. We shall call this algorithmic idea the Core-Algorithm to compute a longest increasing subsequence. The running time of the Core-Algorithm is

in the decision-tree model (see for instance

[5, 7, 10]). This time complexity is tight, as shown in [10]. In the RAM model, where one can also inspect the values, the algorithm can be implemented to run in time [8, 19]. All the results can be parameterized to or , respectively, where is the length of the longest increasing subsequence.

The longest increasing subsequence of is also the longest common subsequence between and the sorted sequence of the elements in . This implies an time (or time if optimized) algorithm to find the longest increasing subsequence when using the standard dynamic programming technique that is used to find longest common subsequences [10, 17].

The model with random persistent comparison errors has been extensively studied for finding the smallest element, for searching, and for sorting (see for instance [6, 12, 13, 16]). A common way to measure the quality of an output sequence in terms of sortedness, is to consider the dislocation of the elements. The dislocation of an element is the absolute difference between its position in the output sequence and its position in the correctly sorted sequence (its rank). Typically, one considers the maximum dislocation of any element in the output sequence and the total dislocation (the sum of the dislocations of all elements). It has been shown for instance in [14], that there is an algorithm with running time which achieves simultaneously maximum dislocation and total dislocation with high probability, and that this is indeed the best one can hope for (i.e., there exist matching lower bounds that show that no possibly randomized algorithm can sort such that, with high probability, the maximum dislocation is or the total dislocation is ). A maximum dislocation of implies the following: on the positive side, it is possible to derive the correct relative order of two elements whose ranks differ by at least ; on the negative side, this is not possible for two elements whose ranks differ by less than . The results on the maximum dislocation of sorting are of interest for the problem of finding the longest increasing subsequence, because an increasing subsequence is also a sorted subsequence.

### 1.2 Our Contribution

We prove asymptotically tight upper and lower bounds on both the approximation factor and the running time for longest increasing subsequence under persistent comparison errors. For the upper bounds, we define an Approximation-Algorithm that computes an -approximation to the longest increasing subsequence of . In fact, it even finds the longest possible increasing subsequence under the implication that we cannot sort better than obtaining an order with maximum dislocation . Formally, we prove the following result:

###### Theorem 1 (Upper Bounds).

For any sequence that contains distinct elements, our Approximation-Algorithm computes an -approximation to the longest increasing sequence of , in time, with probability at least .

This result on the upper bound can be generalized to other error models. In fact, if we are given or able to obtain an approximately sorted sequence with maximum dislocation , then our Approximation-Algorithm will return a -approximation to the longest increasing subsequence. We discuss this point in the Conclusion (Section 6).

To prove our lower bound on the approximation factor of any algorithm solving under persistent comparison errors with high probability, we will identify a small collection of sequences that contain a longest increasing sequence of size and that are likely to look the same in our error model. Then, we show for any algorithm that if it succeeds on one sequence of this collection by returning a constant number of elements of this increasing sequence it must fail on another sequence. In particular, we will prove the following theorem:

###### Theorem 2 (Lower Bound – Approximation Factor).

There exists a collection of sequences (permutations of length

) and a probability distribution on

, such that no algorithm can return an -approximation (for s suitable hidden constant that depends on ) of the longest increasing subsequence with probability .

We prove a lower bound of on the number of comparisons (which is a lower bound on the running time) needed to compute an -approximation by considering the easier case in which all comparisons are correct, and by adapting the techniques used in [10] for proving a similar lower bound for exact (i.e., 1-approximate) algorithms:

###### Theorem 3 (Lower Bound – Running Time).

Any -approximation algorithm for longest increasing subsequence requires comparisons, even if no errors occur.

## 2 Preliminaries

Since we assume that all elements in the input sequence are distinct, we can also assume, for easier analysis and readability, that is a permutation of the numbers (elements) . By our error model, the elements in posses a true linear order , i.e., , however, this order can only be observed through erroneous comparisons.

For two distinct elements and , we will write to denote that is smaller than according to the true linear order (resp. to denote that is larger than according to the true linear order), and we will write (resp. ) to mean that is observed to be smaller (resp. larger) than in the comparison result. For a given sequence and an element , we define to be the true rank of element in (note that ranks start from 1), and we define to be the position of in (positions also start from 1). The dislocation of in is then , and the maximum dislocation of is . For a given sequence , we let denote the comparison outcomes that we can observe. For , this means that if with and , then (resp. if ). Finally, for , we write for the binary logarithm of .

We continue the preliminaries with some results on sorting that we will use to prove our upper bound on the approximation factor.

###### Theorem 4 (Theorem 3 in [14]).

There is an algorithm that approximately sorts, in worst-case time, elements subject to random persistent comparison errors so that the maximum dislocation of the resulting sequence is , with probability .

###### Lemma 1.

Let . If , then for , and are in correct relative order:

###### Proof.

Since the maximum dislocation in is at most , and . These intervals intersect in at most one position, and the claim follows since no two elements can appear in the same position. ∎

## 3 Upper Bound and Approximation-Algorithm

We will modify the so-called Core-Algorithm (as named in Section 1.1, Related work) that computes a longest increasing subsequence in the absence of comparison errors, such that it computes an -approximation with high probability in our error model. Before we do so, we first show that it is possible to identify a -approximation by looking at and a sequence with maximum dislocation . Since we can sort such that the maximum dislocation is (see Theorem 4), this implies an -approximation on .

### 3.1 Upper Bound

The proof of the upper bound is based on the following fact and observation:

• Without any comparison errors, the problem of finding is equivalent to the problem of finding a longest common subsequence between and , where is the correctly sorted order of the elements in .

• This leads to the following observation. Let be the sequence obtained from approximately sorting with comparison errors and consider now as the total order over all elements, i.e., for each pair of elements, their comparison result is redefined as their relative order in . Furthermore, let be any algorithm that solves in the absence of errors. If uses the redefined comparison results, it computes the longest common subsequence between and .

The immediate idea of computing comprises some difficulties, since this subsequence is not necessarily increasing and, on top of that, might be smaller than . However, we can still get a first approximation. Assume that has maximum dislocation at most . Lemma 1 implies that we obtain an increasing subsequence when taking every -th element of . And the maximum dislocation implies that the elements in the subset containing every -th element of appear in the same relative order in , thus When put together, we get a -approximation.

This approximation factor can be improved, and it turns out that considering common subsequences whose elements lie (at least) positions apart in is actually a good start: By Lemma 1, a common subsequence between and is increasing if for every pair of adjacent elements in this subsequence their positions in differ by at least . Therefore, we say that a sequence is -distant in if

 pos(s′i,Sapx)+2d≤pos(s′i+1,Sapx)     for 1≤i

Notice that any (increasing) subsequence of that is -distant in is automatically also a common (increasing) subsequence of and . This observation suggests the following easy recipe to obtain a -approximation on longest increasing subsequence:

• First, partition the elements into subsets, such that every -th element in gets into the same subset, and obtain input subsequences based on this partition.

• Then, on every input subsequence, run any algorithm that computes a longest increasing subsequence if no comparison errors happen, and return the longest result.

By pigeon hole principle and since every input subsequence is now -distant in , the longest result must be a -approximation on . This recipe however is not optimal in the sense that in many cases, we could do better and find a longer subsequence in that is still -distant in . In fact, we lose up to a factor in the case where is already -distant in , but these elements are equally distributed among all input subsequences. For this reason, we will define an approximation algorithm that finds the longest increasing subsequence in that is -distant in . We conclude this section with the obvious lemma.

###### Lemma 2.

The longest subsequence of that is -distant in has length at least

 |S∗|≥12d|LIS(S)|.

### 3.2 Approximation-Algorithm

Consider the Core-Algorithm described in Algorithm 1 that computes the longest increasing subsequence of the input sequence in the error-free case. The algorithm processes the input elements one by one, maintaining the longest increasing subsequence found so far. In particular, it maintains a parameter and an array , such that is the length of the longest increasing subsequence found so far and contains an entry for each length 1 to , such that stores the smallest element processed so far that can be at the end of an increasing subsequence of length .

• The first element is placed to and is set to 1.

• Each subsequent element is placed to , such that is the largest position where is smaller than .

• If is placed to , then is updated to .

• Whenever a new element is placed, put a pointer from to the element in , that, by construction, has a lower value than .

• In the end, follow these pointers from the top element of the last pile to recover the longest increasing subsequence (in reverse order).

An entry basically represents the increasing sequence of length that ends with the smallest possible element processed so far. When an element is inserted into some position this means that it is appended to the sequence represented by . Hence, either increases the longest increasing sequence so far (case ) or the sequence gets replaced by this new sequence (case ).

Our Approximation-Algorithm, as described in Algorithm 2, is obtained by modifying the Core-Algorithm such that it works in our error model.

• We first approximately sort (using the algorithm from [14], see also Theorem 4 in the current paper) the elements of to obtain , and we redefine the comparison outcomes based on this total order, i.e., the result of a comparison between two elements now corresponds to their relative order in .

• To compute a suitable subsequence, we change the algorithm so that it remembers the longest -distant in subsequences instead of the longest increasing subsequences. This implies that an element is only appended to an (intermediate) subsequence that ends with element , if .

For easier analysis, we introduce some additional notation. We call one execution of the lines 2 to 2 of Algorithm 2 an iteration, and enumerate them such that element is considered in iteration . We also say that line 2 corresponds to the first iteration. Furthermore, we denote by and the state and the value of and after the -th iteration, respectively, and for any , we call the subsequence with length the implied sequence of .

###### Lemma 3.

For every , after the -th iteration of our Approximation-Algorithm, every implied sequence is a subsequence of that is -distant in . Moreover, is also -distant in .

###### Proof.

For any and , let be the implied sequence of . Observe that to every element , such that , the algorithm has assigned as its predecessor. Since the predecessor of any element can only have been processed in an earlier iteration, is a subsequence of .

It follows by induction, that the condition on line 2 in Algorithm 2 ensures that is -distant in : It is trivial to see for , thus, assume that every implied sequence before the -th iteration is -distant in . If is inserted into (nothing changes in the other case), the implied sequence of is equal to appended to the implied sequence of (if it exists). By hypothesis and the condition on line 2, is still -distant, and since the other implied sequences do not change, the claim also holds after iteration .

That is -distant in also follows by induction: If changes (thus does not change for all ), then by hypothesis and the conditions in lines 2, 2, and 2, (for all those entries that exist). ∎

###### Lemma 4.

Let be the sequence that our Approximation-Algorithm returns. Then, is a longest subsequence of that is -distant in .

###### Proof.

Lemma 3 implies that is a subsequence of and -distant in . Let be a longest subsequence of that is -distant in . We now show that . In particular, we show by induction that after iteration , . For the base case, consider iteration , where is processed. Either gets inserted into some position , i.e., , or not. If it gets inserted, then by conditions in lines 2 or 2 in Algorithm 2, . If not, then it must hold that and thus

For the step case, consider iteration , where is processed, and observe that the value of only increases during the algorithm, and for any and it holds that Therefore, and by induction hypothesis and the assumption that is -distant in , And Lemma 3 implies, Thus, if does not get inserted, it is because , and if it gets inserted, it will be in some position . In any case, the hypothesis also holds after the iteration iteration , which means that has indeed maximum length. ∎

### 3.3 Proof of Theorem 1

We now prove the initially stated Theorem 1, which for convenience, we restate here:

###### Theorem 1 (Upper Bounds).

For any sequence that contains distinct elements, our Approximation-Algorithm computes an -approximation of the longest increasing sequence of , in time, with probability at least .

###### Proof.

Let according to Theorem 4, such that with probability , the maximum dislocation in is at most . If this is true, by Lemmata 1-4, our Approximation-Algorithm returns a subsequence of that is increasing, and that has length at least .

The running time consists of the initial sorting, which by Theorem 4 takes time333By modifying this algorithm so that it returns also the mapping from each element in to its position in we can obtain the new comparison results in the same time., and the iterations of the algorithm, which take time each if binary search is used to implement line 10. The final construction of the output takes time, where is the length of the approximation. ∎

## 4 Lower Bound on the Approximation Factor

We continue this paper with a lower bound on the approximation factor, that implies that the upper bound we showed in Theorem 1 is tight up to constant factors. In particular, we prove Theorem 2, which we restate here:

###### Theorem 2 (Lower Bound – Approximation Factor).

There exists a collection of sequences (permutations of length ) and a probability distribution on , such that no algorithm can return an -approximation (for some suitable hidden constant that depends on ) of the longest increasing subsequence with probability .

Our proof can be seen as a generalization of the lower bound on the maximum dislocation for sorting (see proof of Theorem 9 in [12]), where it is shown that two elements whose ranks differ by less than are likely to be indistinguishable by any algorithm, and hence to appear in the wrong relative order. Intuitively, the argument there is as follows: consider the sorted sequence and the sequence obtained by swapping two elements, and assume that the comparison outcomes on these sequences look identically. It turns out that the probability of this happening is larger than , whenever the rank difference is smaller than , since only a small number of comparison outcomes must differ.

This is not enough in our case, since an algorithm could simply ignore such two elements. For instance, consider an increasing sequence of adjacent elements. If the first and the last element are swapped, the algorithm could simply return the subsequence without these two elements and be almost optimal. A first idea to fix this problem could be to consider the case, where one observes the whole increasing sequence to be reversed. However, to have this happen with probability larger than , needs to be smaller than , thus implying a weaker lower bound.

Instead, we shall use a collection of similar sequences (more than two), such that if an algorithm succeeds on one of these sequences it must fail on another one.

###### Proof.

We say that an algorithm succeeds if it returns a -approximation for any constant , otherwise we say it fails. We shall first define our collection of similar sequences. Let . Let denote the sequence, in which the largest elements appear first in increasing order and then the remaining elements appear in decreasing order,

 S∗:=⟨n−η+1,…,n−1,n,  n−η,…,1⟩.

Furthermore, for , let be the sequence obtained from when the largest element is moved to position ,

 S(i):=⟨n−η+1,…,n−η+(i−1),n,n−η+i,…,n−1,  n−η,…,1⟩.

Now, let (note that basically ) and let

be the uniform distribution over

. We will show (proof by contradiction) that no algorithm succeeds on this pair () with probability at least .

Assume towards a contradiction that algorithm succeeds with high probability on a sequence chosen uniformly at random from , i.e.,

 Pr(A(S′) succeeds)=η∑i=1Pr(A(S(i)) % succeeds)⋅Pr(S′=S(i))≥1−1n.

This implies that

 P:=Pr(A(S∗) succeeds)≥1−ηn, (2)

since by hypothesis and assuming the case where the algorithm succeeds on all the other input sequences (i.e., best case for the algorithm, worst case for the proof), resolves to (2).

Let , then means that algorithm runs on sequence and observes comparison outcomes . Now, consider the set of all comparison outcomes that the algorithm can observe and let denote the set of all possible comparison outcomes for which succeeds on input . We define

to be the random variable corresponding to the comparison outcomes as they would be observed by the algorithm when the input sequence is

. Then, the probability that succeeds is expressed by the total probabilities of the events that observes comparison outcomes in ,

 P=Pr(A(S∗) succeeds)=∑C∈CPr(R(S∗)=C). (3)

Before we continue the proof, we shall first show the following lemma.

and ,

###### Proof.

Consider and and let be the set of wrong comparison results, i.e., the set of pairs () with such that either and (i.e., ) or and . Thus,

 Pr(R(S∗)=C)=(1−p)(n2)−|E(S∗,C)|⋅p|E(S∗,C)|=(1−p)(n2)⋅(p1−p)|E(S∗,C)|.

Now consider and observe that only the relative order of the pairs with , changed compared to . This implies that there can be at most additional wrong comparison results, i.e., . Therefore, and since ,

 Pr(R(S)=C) =(1−p)(n2)⋅(p1−p)|E(S,C)| >(1−p)(n2)⋅(p1−p)|E(S∗,C)|+η=Pr(R(S∗)=C)⋅(p1−p)η. ∎

Continuation of the Proof of Theorem 2. Now notice that in order to succeed, needs to return at least two of the first elements in . Therefore, we can map every to a (not necessarily unique) sequence of as follows: for each , let be the position of the first element that returns and let . (Note that as otherwise does not return at least two elements of the first elements in .) For each ,

 Pr(A(S) fails) ≥∑C∈C:S=S(C)Pr(R(S)=C)>∑C∈C:S=S(C)Pr(R(S∗)=C)⋅(p1−p)η

And as a consequence, for chosen uniformly at random,

 Pr(A(S′) fails) ≥∑S∈S∖{S∗}Pr(S′=S)⋅Pr(A(S) fails) >∑S∈S∖{S∗}1η∑C∈C:S=S(C)Pr(R(S∗)=C)⋅(p1−p)η =1η(p1−p)η∑S∈S∖{S∗}∑C∈C:S=S(C)Pr(R(S∗)=C) ≥1η(p1−p)η∑C∈CPr(R(S∗)=C)≥1η(p1−p)η(1−ηn),

where from line 3 to line 4 we use that every instance of comparison results is mapped to exactly one sequence, and on the last line we use Equations (2) and (3). Now, observe that for large enough, and that, by our choice of , . Therefore,

 Pr(A(S′) fails)>2log1−pplogn⋅1√n⋅12>1n.

However, this contradicts our assumption that succeeds with high probability. ∎

The lower bound shown in Theorem 2 holds for all deterministic algorithms, but can be expanded to also hold for probabilistic algorithms as explained in the following remark.

###### Remark 1.

To make the lower bound on the approximation factor work also for any randomized algorithm , we can turn into a deterministic version by fixing a sequence random bits that can be used by the algorithm. Thus, for the resulting deterministic algorithm , the lower bound holds. Let be the probability to generate the sequence of random bits. To lower bound the probability that fails, where is chosen uniformly at random from , one simply needs to sum over all the probabilities that fails multiplied by , i.e.,

## 5 Lower Bound on the Running Time

We complement this paper by showing that the running time of our Approximation-Algorithm is asymptotically optimal. In [10], it is shown that (in the error-free model) computing the longest increasing subsequence is at least as hard as sorting. We will use this proof to informally show Theorem 3 which we restate here (we postpone a formal proof to the full version of the paper):

###### Theorem 3 (Lower Bound – Running Time).

Any -approximation algorithm for longest increasing subsequence requires comparisons, even if no errors occur.

The proof techniques of the lower bound in [10] are as follows: Assume that we are in the error-free case. Consider the easier problem of deciding on a given sequence of distinct elements whether , and consider the comparison tree of an algorithm with leaves that tell as an answer to this question either “yes” or “no”. Without loss of generality, assume that no useless comparisons are made on a root to a leaf path (i.e., no comparison twice and no comparisons whose outcome is predictable by the outcomes of previous comparisons).

Every leaf can be associated with a partial order implied by a set of linear orderings on that are consistent with the transitive closure of the comparisons performed on the path from the root to . If the answer in a leaf is “yes”, this implies that there are no elements of that are pairwise incomparable in this partial order (i.e., the relative order of every pair is neither tested in any comparison on the path, nor implied by other comparisons), as otherwise, these elements could possibly form an increasing sequence of length . Such a subset of elements is called antichain, while a chain is a subset of elements that are linearly ordered. An important property of chains and antichains used in the proof is, that in a “yes”-leaf, the elements can be partitioned into less than chains, since in any partial order, the elements can be partitioned into chains, where is the size of the largest antichain. Furthermore, given such a partition into (less than) chains, the elements can be sorted with comparisons (think for instance of natural merge sort).

In order to lower bound the number of comparisons needed to end in a “yes”-leaf, algorithm can be extended to as follows: whenever concludes to be in a “yes”-leaf, continues to completely sort the elements of (which requires no more than further comparisons). Let denote the number of linear orderings of the elements in that end in a “yes”-leaf. Then, since there are different linear orderings and possible subsequences of size each increasing with probability . The comparison tree corresponding to has thus at least leaves, and therefore must perform at least comparisons in its worst case. Therefore, must perform at least comparisons in its worst case to end up in a “yes”-leaf, which is when choosing .

We can use the above proof techniques to show that every algorithm, that computes a -approximation on longest increasing subsequence must perform at least comparisons. Let be an -approximation algorithm for under our error model (i.e., we can always simulate our error model in the error-free case) and consider a relaxation of the problem of determining whether is smaller than . In this relaxation we require the answer to be “yes” (resp. “no”) if (resp. ), while we do not impose any restriction on the range . It is clear that algorithm can be used to solve this relaxed problem without increasing the number of needed comparisons. Therefore, the associated comparison tree must reach a leaf corresponding to answer “yes” for all linear orderings on the elements in that contain no increasing subsequence of length , while the largest antichain in any such an ordering is smaller than . This implies that (still in the error-free case) needs at least further comparisons in the worst case to sort the elements in , and needs at least comparisons in the worst case to end in a “yes”-leaf, which is in if we set .

Finally, we can conclude that our Approximation-Algorithm performs in asymptotically optimal time, since we can always simulate our error model in the error-free case.

## 6 Conclusion

Although a logarithmic approximation ratio might not seem very exciting at first glance, it turns out that this is the best one that can be obtained in the presence of persistent comparison errors. In this respect, it is interesting to see that there exist such a simple recipe to compute a logarithmic approximation. A recipe that can use as a black box any algorithm that computes a longest increasing sequence if no comparison errors happen:

• First, obtain an approximately sorted sequence of the elements such that the maximum dislocation is and redefine the comparisons according to this order. Then, partition the elements into subsets, such that every -th element in gets into the same partition, and obtain input subsequences based on this partition. Finally, run the algorithm on every input subsequence and return the longest result.

As indicated earlier, our Approximation-Algorithm has the advantage, that it performs much better than -approximate on many input sequences and is even optimal in the case where the longest increasing subsequence is already -distant in , whereas this is not necessarily true when using the simple recipe. Moreover, it is easy to observe that the Approximation-Algorithm is never worse than the recipe.

Finally, we would like to explain how the upper bound on the approximation factor can be generalized. Our Approximation-Algorithm actually succeeds whenever the approximately sorted sequence has maximum dislocation at most . This implies that the result can be parametrized and also used in other models with comparison comparison errors.

• Whenever one can obtain a total order with maximum dislocation , the Approximation-Algorithm is -approximative.

Consider for instance the so-called threshold-model [1, 11, 15], where comparisons between numbers that differ by more than some threshold are always correct, while those between numbers that differ by less than can fail persistently (with some probability possibly depending on the difference or even adversarially). If the input sequence is a permutation of the numbers , running Quicksort in this error model yields a sequence with maximum dislocation (see [15]). Thus, our Approximation-Algorithm finds a -approximation of the longest increasing subsequence in .

## References

• [1] Miklós Ajtai, Vitaly Feldman, Avinatan Hassidim, and Jelani Nelson. Sorting and selection with imprecise comparisons. ACM Transactions on Algorithms, 12(2):19, 2016.
• [2] David Aldous and Persi Diaconis. Longest increasing subsequences: from patience sorting to the baik-deift-johansson theorem. Bulletin of the American Mathematical Society, 36(4):413–432, 1999.
• [3] Eitan Bachmat, Daniel Berend, Luba Sapir, Steven Skiena, and Natan Stolyarov.

Analysis of aeroplane boarding via spacetime geometry and random matrix theory.

Journal of Physics A: Mathematical and General, 39(29):L453, 2006.
• [4] Jinho Baik, Percy Deift, and Kurt Johansson. On the distribution of the length of the longest increasing subsequence of random permutations. Journal of the American Mathematical Society, 12(4):1119–1178, 1999.
• [5] Sergei Bespamyatnikh and Michael Segal. Enumerating longest increasing subsequences and patience sorting. Inf. Process. Lett., 76(1-2):7–11, 2000.
• [6] Mark Braverman and Elchanan Mossel. Noisy sorting without resampling. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008, pages 268–276, 2008.
• [7] Badrish Chandramouli and Jonathan Goldstein. Patience is a virtue: revisiting merge and sort on modern processors. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 731–742, 2014.
• [8] Maxime Crochemore and Ely Porat. Fast computation of a longest increasing subsequence and application. Inf. Comput., 208(9):1054–1059, 2010.
• [9] Arthur L. Delcher, Simon Kasif, Robert D. Fleischmann, Jeremy Peterson, Owen White, and Steven L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27(11):2369–2376, 1999.
• [10] Michael L. Fredman. On computing the length of longest increasing subsequences. Discrete Mathematics, 11(1):29–35, 1975.
• [11] Stefan Funke, Kurt Mehlhorn, and Stefan Näher. Structural filtering: a paradigm for efficient and exact geometric programs. Comput. Geom., 31(3):179–194, 2005.
• [12] Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. Sorting with recurrent comparison errors. In 28th International Symposium on Algorithms and Computation, ISAAC 2017, December 9-12, 2017, Phuket, Thailand, pages 38:1–38:12, 2017.
• [13] Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. Optimal dislocation with persistent errors in subquadratic time. In 35th Symposium on Theoretical Aspects of Computer Science, STACS 2018, February 28 to March 3, 2018, Caen, France, pages 36:1–36:13, 2018.
• [14] Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. Optimal Sorting with Persistent Comparison Errors. ArXiv e-prints, April 2018.
• [15] Barbara Geissmann and Paolo Penna. Inversions from sorting with distance-based errors. In SOFSEM 2018: Theory and Practice of Computer Science - 44th International Conference on Current Trends in Theory and Practice of Computer Science, Krems, Austria, January 29 - February 2, 2018, Proceedings, pages 508–522, 2018.
• [16] Rolf Klein, Rainer Penninger, Christian Sohler, and David P. Woodruff. Tolerant algorithms. In Algorithms - ESA 2011 - 19th Annual European Symposium, Saarbrücken, Germany, September 5-9, 2011. Proceedings, pages 736–747, 2011.
• [17] William J. Masek and Michael S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20(1):18 – 31, 1980.
• [18] Chris N. Potts, David B. Shmoys, and David P. Williamson. Permutation vs. non-permutation flow shop schedules. Operations Research Letters, 10(5):281–284, 1991.
• [19] I-Hsuan Yang, Chien-Pin Huang, and Kun-Mao Chao. A fast algorithm for computing a longest common increasing subsequence. Inf. Process. Lett., 93(5):249–253, 2005.
• [20] Hongyu Zhang. Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm. Bioinformatics, 19(11):1391–1396, 2003.