In the classic version of the sorting problem, we are given a set, , of comparable items coming from a fixed total order and asked to compute a permutation that places the items from into non-decreasing order, and it is well-known that this can be done using comparisons, which is asymptotically optimal (e.g., see [6, 8, 14]). However, there are a number of interesting applications where this classic version of the sorting problem doesn’t apply.
For instance, consider the problem of maintaining a ranking of a set of sports teams based on the results of head-to-head matches. A typical approach to this sorting problem is to assume there is a fixed underlying total order for the teams, but that the outcomes of head-to-head matches (i.e., comparisons) are “noisy” in some way. In this formulation, the ranking problem becomes a one-shot optimization problem of finding the most-likely fixed total order given the outcomes of the matches (e.g., see [5, 7, 9, 10, 15]). In this paper, we study an alternative, complementary motivating scenario, however, where instead of there being a fixed total order and noisy comparisons we have a scenario where comparisons are accurate but the underlying total order is evolving. This scenario, for instance, captures the real-world phenomenon where sports teams make mid-season changes to their player rosters and/or coaching staffs that result in improved or degraded competitiveness relative to other teams. That is, we are interested in the sorting problem for evolving data.
1.1 Related Prior Work for Evolving Data
Anagnostopoulos et al.  introduce the evolving data framework, where an input data set is changing while an algorithm is processing it. In this framework, instead of an algorithm taking a single input and producing a single output, an algorithm attempts to maintain an output close to the correct output for the current state of the data, repeatedly updating its best estimate of the correct output over time. For instance, Anagnostopoulos et al.  mention the motivation of maintaining an Internet ranking website that displays an ordering of entities, such as political candidates, movies, or vacation spots, based on evolving preferences.
Researchers have subsequently studied other interesting problems in the evolving data framework, including the work of Kanade et al.  on stable matching with evolving preferences, the work of Huang et al.  on selecting top- elements with evolving rankings, the work of Zhang and Li  on shortest paths in evolving graphs, the work of Anagnostopoulos et al.  on st-connectivity and minimum spanning trees in evolving graphs, and the work of Bahmani et al.  on PageRank in evolving graphs. In each case, the goal is to maintain an output close to the correct one even as the underlying data is changing at a rate commensurate to the speed of the algorithm. By way of analogy, classical algorithms are to evolving-data algorithms as throwing is to juggling.
1.2 Problem Formulation for Sorting Evolving Data
With respect to the sorting problem for evolving data, following the formulation of Anagnostopoulos et al. , we assume that we have a set, , of distinct items that are properly ordered according to a total order relation, “”. In any given time step, we are allowed to compare any pair of items, and , in according to the “” relation and we learn the correct outcome of this comparison. After we perform such a comparison, pairs of items that are currently consecutive according to the “” relation are chosen uniformly at random and their relative order is swapped. As in previous work , we focus on the case where , but one can also consider versions of the problem where the ratio between comparisons and random consecutive swaps is something other than one-to-one. Still, this simplified version with a one-to-one ratio already raises some interesting questions.
Since it is impossible in this scenario to maintain a list that always reflects a strict ordering according to the “” relation, our goal is to maintain a list with small Kendall tau distance, which counts the number of inversions, relative to the correct order.111Recall that an inversion is a pair of items and such that comes before in a list but . An inversion in a permutation is a pair of elements with and . Anagnostopoulos et al.  show that, for , the Kendall tau distance between the maintained list and the underlying total order is in both expectation and with high probability. They also show how to maintain this distance to be
, with high probability, by performing a multiplexed batch of quicksort algorithms on small overlapping intervals of the list. Recently, Besa Vialet al.  empirically show that repeated versions of quadratic-time algorithms such as bubble sort and insertion sort seem to maintain an asymptotically optimal distance of . In fact, this linear upper bound seems to hold even if we allow , the number of random swaps at each step, to be a much larger constant.
1.3 Our Contributions
The main contribution of the present paper is to prove that repeated insertion sort maintains an asymptotically optimal Kendall tau distance, with high probability, for sorting evolving data. This algorithm repeatedly makes in-place insertion-sort passes (e.g., see [6, 8]) over the list, , maintained by our algorithm at each step . Each such pass moves the item at position to an earlier position in the list so long as it is bigger than its predecessor in the list. With each comparison done by this repeated insertion-sort algorithm, we assume that a consecutive pair of elements in the underlying ordered list, , are chosen uniformly at random and swapped. In spite of the uncertainty involved in sorting evolving data in this way, we prove the following theorem, which is the main result of this paper.
Running repeated insertion-sorts algorithm, for every step , the Kendall tau distance between the maintained list, , and the underlying ordered list, , is with exponentially high probability.
That is, after an initialization period of steps, the repeated insertion-sort algorithm converges to a steady state having an asymptotically optimal Kendall tau distance between the maintained list and the underlying total order, with exponentially high probability. We also show how to reduce this initialization period to be steps, with high probability, by first performing a quicksort algorithm and then following that with the repeated insertion-sort algorithm.
Intuitively, our proof of subsection 1.3 relies on two ideas: the adaptivity of insertion sort and that, as time progresses, a constant fraction of the random swaps fix inversions. Ignoring the random swaps for now, when there are inversions, a complete execution of insertion sort performs roughly comparisons and fixes the inversions (e.g., see [6, 8]). If an fraction of the random swaps fix inversions, then during insertion sort inversions are fixed by the random swaps and are introduced. Naively the total change in the number of inversions is then and when , the number of inversions decreases. So the number of inversions will decrease until .
This simplistic intuition ignores two competing forces involved in the heavy interplay between the random swaps and insertion sort’s runtime, however, in the evolving data model, which necessarily complicates our proof. First, random swaps can cause an insertion-sort pass to end too early, thereby causing insertion sort to fix fewer inversions than normal. Second, as insertion sort progresses, it decreases the chance for a random swap to fix an inversion. Analyzing these two interactions comprises the majority of our proof of Theorem 1.3.
The sorting algorithm we analyze in this paper for the evolving data model is the repeated insertion-sort algorithm whose pseudocode is shown in Algorithm 1.
Formally, at time , we denote the sorting algorithms’ list as and we denote the underlying total order as . Together these two lists define a permutation, , of the indices, where if the element at index in is at position in . We define the simulated final state at time to be the state of obtained by freezing the current underlying total order, , (i.e., no more random swaps) and simulating the rest of the current round of insertion sort (we refer to each iteration of the while-true loop in Algorithm 1 as a round). We then define a frozen-state permutation, , where if the element at index in the simulated final state at time as at index in .
Let us denote the number of inversions at time , in , with . Throughout the paper, we may choose to drop time subscripts if our meaning is clear. The Kendall tau distance between two permutations and is the number of pairs of elements such that and . That is, the Kendall tau distance between and is equal to , the number of inversions in . Figure 1 shows the state of , , , and for two steps of an insertion sort (but not in the same round).
As the inner while-loop of Algorithm 1 executes, we can view as being divided into three sets: the set containing just the active element, (which we view as moving to the left, starting from position , as it is participating in comparisons and swaps), the semi-sorted portion, , not including , and the unsorted portion, . Note that if no random adjacent swaps were occurring in (that is, if we were executing insertion-sort in the classical algorithmic model), then the semi-sorted portion would be in sorted order.
We call the path from the root to the rightmost leaf of the Cartesian tree the (right-to-left) minima path as the elements on this path are the right-to-left minima in the list. The minima path is highlighted in Figure 4. For a minimum, , denote with the index of the element in the left subtree of that maximizes , i.e., the index of the largest element in the left subtree.
We use the phrase with high probability to indicate when an event occurs with probability that tends towards as . When an event occurs with probability of the form , we say it occurs with exponentially high probability. During our analysis, we will make use of the following facts.
[Poisson approximation (Corollary 5.9 in )] Let be the number of balls in each bin when balls are thrown uniformly at random into bins. Let
be independent Poisson random variables with. Then for any event :
[Hoeffding’s inequality (Theorem 2 in )] If are independent random variables and for , then for :
3 Sorting Evolving Data with Repeated Insertion Sort
Let us begin with some simple bounds with respect to a single round of insertion sort.
If a round of insertion sort starts at time and finishes at time , then
, where is the number of inversions fixed (at the time of a comparison in the inner while-loop) by this round of insertion sort.
for any , .
(1): For each iteration of the outer for-loop, each comparison in the inner while-loop either fixes an inversion (at the time of that comparison) or fails to fix an inversion and completes the inner while-loop. Note that this “failed” comparison may not have compared elements of , but may have short circuited due to . Nevertheless, every comparison that doesn’t fail fixes an inversion (at the time of that comparison); hence, each non-failing comparison is counted in .
(2): In any round, there are at most comparisons, by the formulations of the outer for-loop and inner while-loop.
(3): At time , the round of insertion sort will have executed steps. Of those steps, at least comparisons resulted in a swap that removed an inversion and at most comparisons did not result in a change to . The random swaps occurring during these comparisons introduced at most inversions. So . ∎
We next assert the following two lemmas, which are used in the next section and proved later.
 There exists a constant, , such that, for a round of insertion sort that takes time , at least of the random adjacent swaps in decrease during the round, with exponentially high probability.
See Appendix A. ∎
If a round of insertion sort starts at time with and finishes at time , then, with exponentially high probability, , i.e., the insertion sort round takes at least steps.
See Section 4. ∎
3.1 Proof of Theorem 1.3
Theorem 1.3. There exists a constant, , such that, when running the repeated insertion-sort algorithm, for every step , the Kendall tau distance between the maintained list, , and the underlying ordered list, , is , with exponentially high probability.
By Lemma 3, there exists a constant such that at least an
fraction of all of the random swaps during a round of insertion sort fix inversions. Consider an epoch of the laststeps of the repeated insertion-sort algorithm, that is, from time to . During this epoch, some number, , of complete rounds of insertion sort are performed from start to end (by Lemma 3). Denote with the time at which insertion-sort round ends (and round begins), and let denote the end time of the final complete round, during this epoch. By construction, observe that and . Furthermore, because the insertion-sort rounds running before and after take fewer than steps (by Lemma 3), .
The remainder of the proof consists of two parts. In the first part, we show that for some complete round of insertion sort ending at time , is , with exponentially high probability. In the second part, we show that once we achieve being , for , then is , with exponentially high probability.
For the first part, suppose, for the sake of a contradiction, , for all . Then, by a union bound over the polynomial number of rounds, Lemma 3 applies to every such round of insertion sort. So, with exponentially high probability, each round takes at least steps. Moreover, by Lemma 3, with exponential probability, an fraction of the random swaps from to will decrease the number of inversions. That is, these random swaps increase the number of inversions by at most
with exponentially high probability. Furthermore, by Lemma 3, at least a fraction of the insertion-sort steps fix inversions (at the time of a comparison). Therefore, with exponentially high probability, we have the following:
But, since , the above bound implies that , which is a contradiction. Therefore, with exponentially high probability, there is a such that .
For the second part, we show that the probability for a round to have is exponentially small, by considering two cases (and their implied union-bound argument):
If , then Lemma 3 implies .
If , then, similar to the argument given above, during a round of insertion sort, , at least a fraction of the steps fix an inversion, and an fraction of the steps do nothing. Also at least an fraction of the random swaps fix inversions, while a fraction add inversions. Finally, the total length of the round is . Thus, with exponentially high probability, the total change in inversions is at most and .
Therefore, by a union bound over the polynomial number of insertion-sort rounds, the probability that any for is exponentially small. By Lemma 3, . So, with exponentially high probability, and , completing the proof. ∎
3.2 Improved Convergence Rate
In this subsection, we provide an algorithm that converges to inversions more quickly. To achieve the steady state of inversions, repeated insertion sort performs comparisons. But this running time to reach a steady state is a worst-case based on the fact that the running time of insertion sort is , where is the number of initial inversions in the list, and, in the worst case, is . By simply running a round of quicksort on first, we can achieve a steady state of inversions after just comparisons. See Algorithm 2. That is, we have the following.
When running Algorithm 2, for every , is with high probability.
By the results of Anagnostopoulos et al. , the initial round of quicksort takes comparisons and afterwards the number of inversions (that is, the Kendall tau distance between the maintained list and the true total order) is , with high probability. Using a nearly identical argument to the proof of Theorem 1.3, and the fact that an insertion-sort round takes time to resolve inversions, the repeated insertion-sort algorithm will, with high probability, achieve inversions in an additional steps. From that point on, it will maintain a Kendall tau distance of , with high probability. ∎
4 Proof of Lemma 3
Recall Lemma 3, which establishes a lower bound for the running time of an insertion-sort round, given a sufficiently large amount of inversions relative to the underlying total order.
Lemma 3. If a round of insertion sort starts at time with and finishes at time , then, with exponentially high probability, , i.e., the insertion sort round takes at least steps.
The main difficulty in proving Lemma 3 is understanding how the adjacent random swaps in affect the runtime of the current round of insertion sort on . Let be the number of steps left to perform in the current round of insertion sort if there were no more random adjacent swaps in . In essence, can be thought of as an estimate of the remaining time in the current insertion sort round. If a new round of insertion sort is started at time , then and . Each step of an insertion sort round decreases by one and the following random swap may increase or decrease by some amount. Figure 2 illustrates an example where one random adjacent swap in decreases by a non-constant amount (relative to ).
A random adjacent swap in involving two elements in the unsorted portion of will either increase or decrease by one depending on whether it introduces or removes an inversion. Random adjacent swaps involving elements in the semi-sorted portion have more complex effects on .
An inversion currently in the list will be fixed by insertion sort if and will be compared and the two are swapped. Because , must be the active element during this comparison. An inversion will not be fixed by insertion sort if was already inserted into the semi-sorted portion or there is some element in the semi-sorted portion with and . We call an inversion with in the semi-sorted portion a stuck inversion and an inversion with a smaller semi-sorted element between the pair a blocked inversion. We say an element in the semi-sorted portion of blocks an inversion with and either the active element or in the unsorted portion of , if is in the semi-sorted portion of with and . Note that there may be multiple elements blocking a particular inversion. Figure 3 shows examples of these two types of inversions.
We denote the number of “bad” inversions at time that will not be fixed with . That is, is the sum of the blocked and stuck inversions. At the end of an insertion-sort round every inversion present at the start was either fixed by the insertion sort, fixed by a random adjacent swap in , or is currently stuck. No elements can be blocked at the end of an insertion-sort round, because the semi-sorted portion is the entire list. Stuck inversions are either created by random adjacent swaps in or were blocked inversions and insertion sort finished inserting the right element of the pair. Blocked inversions are only introduced by the random adjacent swaps in . Thus is unaffected by the steps of insertion sort.
Every inversion present at the start must be fixed by a step of insertion sort, be fixed by a random swap, or it will end up “bad”. Therefore, for any given time, , by using naive upper bounds based on the facts that every insertion sort step can fix an inversion and every random adjacent swap can remove an inversion, we can immediately derive the following:
For an insertion sort round that starts at time and ends at time , if , then .
Since, when an insertion sort round finishes, , Lemma 4 implies . If we understand how changes with each random adjacent swap in , then we can bound how long insertion sort needs to run for this inequality to be true.
We associate the blocked and stuck inversions with elements that we say are blamed for the inversions. A blocked inversion blames the element with and minimum . Note that is on the minima path of the modified Cartesian tree (see Appendix B), and is in the left subtree of . A stuck inversion either blames the element on the minima path whose subtree contains both and or if they appear in different subtrees, the inversion blames the element with and minimum . Again note that the blamed element is on the minima path and is in the blamed element’s left subtree. The bad inversions in Figure 3 blame the red element.
Whether stuck or blocked, every inversion blames an element on the minima path and the left element of the inverted pair appears in that minimum’s subtree. If is on the minima path, is the index of the element in ’s subtree with maximum , and an inversion has in ’s subtree, then both and are in the range to . So we can upper bound by , where we extend to non-minima indices with if is not the index of a minima in .
4.1 Bounding the Number of Blocked and Stuck Inversions with Counters
For the purposes of bounding , we conceptually associate two counters, and , with each element, . The counters are initialized to zero at the start of an insertion sort round. When an element is increased by a random swap in , we increment and when is decreased by a random swap in , we increment . After the random swap occurs, we may choose to exchange some of the counters between pairs of elements, but we will always maintain the following invariant:
Invariant 1. For an element, , on the minima path,
This invariant allows us to prove the following Lemma: If and , then .
|By Invariant 1||(1)|
By the assumptions of this lemma, interpreting Inc and Dec as two
-dimensional vectors, we know their lengths are both less than. Equation 1 is the squared length of the sum of the Dec and Inc vectors with the entries of Inc permuted by the function . By the triangle inequality, the length of their sum is at most and so the squared length of their sum is at most .
Therefore, . ∎
In the appendix we prove the following lemma for these increment and decrement counters.  There is a counter maintenance strategy that maintains Invariant 1 such that after each random adjacent swap in , the corresponding counters are incremented and then some counters are exchanged between pairs of elements.
4.2 Bounding the Counters with Balls and Bins
We model the Inc and Dec counters each with a balls and bins process and analyze the sum of squares of balls in each bin. Each element in is associated with one of bins. When an element’s Inc counter is increased, throw a ball into the corresponding bin. If a pair of Inc counters are exchanged, exchange the set of balls in the two corresponding bins. The Dec counters can be modeled similarly.
This process is almost identical to throwing balls into bins uniformly at random. Note that the exchanging of balls in pairs of bins takes place after a ball has been placed in a chosen bin, effectively permuting two bin labels in between steps. If every bin was equally likely to be hit at each time step, then permuting the bin labels in this way would not change the final sum of squares and the exchanging of counters could be ignored entirely. Unfortunately the bin for the element at in the case of Inc counters or in the case of Dec counters cannot be hit, i.e., there is a forbidden bin controlled by the counter swapping strategy. However, even when in each round the forbidden bin is adversarially chosen, the sum of squares of the number of balls in each bin will be stochastically dominated by a strategy of always forbidding the bin with the lowest number of balls. Therefore, the sum of squares of balls being thrown uniformly at random into bins stochastically dominates the sum of squares of the Inc (or Dec) counters after steps.
If balls are each thrown uniformly at random into bins with , then the sum over the bins of the square of the number of balls in each bin is at most with exponentially high probability.
Let be random variables where is the number of balls in bin and let be independent Poisson random variables with .
By the Poisson approximation, Lemma 1,
Let be the event that and be the event that at least one occurs.
Given , . So we can apply Hoeffding’s inequality, Lemma 1, to get:
Setting , we have:
Because , we have .
Thus, we can conclude . ∎
Recall that when the insertion sort round finishes, . If fewer than steps have been performed, the left hand side of this inequality is less than with exponentially high probability. Therefore, if we started with inversions, the current round of insertion sort must perform at least steps with exponentially high probability; otherwise, there are unfixed but still “good” inversions. This completes the proof of Lemma 3.
We have shown that, although it is much simpler than quicksort and only fixes at most one inversion in each step, repeated insertion sort leads to the asymptotically optimal number of inversions in the evolving data model. We have also shown that by using a single round of quicksort before our repeated insertion sort, we can get to this steady state after an initial phase of steps, which is also asymptotically optimal.
For future work, it would be interesting to explore whether our results can be composed with other problems involving algorithms for evolving data, where sorting is a subcomponent. In addition, our analysis in this paper is specific to insertion sort, and only applies when exactly one random swap is performed after each comparison. We would like to extend this to other sorting algorithms that have been shown to perform well in practice and to the case in which the number of random swaps per comparison is a larger constant. Finally, it would also be interesting to explore whether one can derive a much better value than we derived in the proof of Lemma 3.
-  Aris Anagnostopoulos, Ravi Kumar, Mohammad Mahdian, and Eli Upfal. Sorting and selection on dynamic data. Theoretical Computer Science, 412(24):2564–2576, 2011. Special issue on selected papers from 36th International Colloquium on Automata, Languages and Programming (ICALP 2009). doi:10.1016/j.tcs.2010.10.003.
-  Aris Anagnostopoulos, Ravi Kumar, Mohammad Mahdian, Eli Upfal, and Fabio Vandin. Algorithms on evolving graphs. In 3rd ACM Innovations in Theoretical Computer Science Conference (ITCS), pages 149–160, 2012. doi:10.1145/2090236.2090249.
-  Bahman Bahmani, Ravi Kumar, Mohammad Mahdian, and Eli Upfal. Pagerank on an evolving graph. In 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 24–32, 2012. doi:10.1145/2339530.2339539.
-  Juan Jose Besa Vial, William E. Devanny, David Eppstein, Michael T. Goodrich, and Timothy Johnson. Quadratic time algorithms appear to be optimal for sorting evolving data. In Proc. Algorithm Engineering & Experiments (ALENEX 2018), pages 87–96, 2018. doi:10.1137/1.9781611975055.8.
-  Mark Braverman and Elchanan Mossel. Noisy sorting without resampling. In 19th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 268–276, 2008.
-  Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.
-  Uriel Feige, Prabhakar Raghavan, David Peleg, and Eli Upfal. Computing with noisy information. SIAM Journal on Computing, 23(5):1001–1018, 1994. doi:10.1137/S0097539791195877.
-  Michael T. Goodrich and Roberto Tamassia. Algorithm Design and Applications. Wiley Publishing, 1st edition, 2014.
-  Benoit Groz and Tova Milo. Skyline queries with noisy comparisons. In 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), pages 185–198, 2015. doi:10.1145/2745754.2745775.
-  Dorit S. Hochbaum. Ranking sports teams and the inverse equal paths problem. In Paul Spirakis, Marios Mavronicolas, and Spyros Kontogiannis, editors, 2nd Int. Workshop on Internet and Network Economics (WINE), volume 4286 of Lecture Notes in Computer Science, pages 307–318, Berlin, Heidelberg, 2006. Springer. doi:10.1007/11944874_28.
-  Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963. doi:10.1080/01621459.1963.10500830.
-  Qin Huang, Xingwu Liu, Xiaoming Sun, and Jialin Zhang. Partial sorting problem on evolving data. Algorithmica, 79(3):1–24, 2017. doi:10.1007/s00453-017-0295-3.
Varun Kanade, Nikos Leonardos, and Frédéric Magniez.
Stable Matching with Evolving Preferences.
In Klaus Jansen, Claire Mathieu, José D. P. Rolim, and Chris
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM), volume 60 of LIPIcs, pages 36:1–36:13, Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. doi:10.4230/LIPIcs.APPROX-RANDOM.2016.36.
-  Donald Ervin Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Pearson Education, 2nd edition, 1998.
-  Konstantin Makarychev, Yury Makarychev, and Aravindan Vijayaraghavan. Sorting noisy data with partial information. In 4th ACM Conference on Innovations in Theoretical Computer Science (ITCS), pages 515–528, 2013. doi:10.1145/2422436.2422492.
-  Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005.
-  Jean Vuillemin. A unifying look at data structures. Commun. ACM, 23(4):229–239, 1980. doi:10.1145/358841.358852.
-  Jialin Zhang and Qiang Li. Shortest paths on evolving graphs. In H. Nguyen and V. Snasel, editors, 5th Int. Conf. on Computational Social Networks (CSoNet), volume 9795 of Lecture Notes in Computer Science, pages 1–13, Berlin, Heidelberg, 2016. Springer. doi:10.1007/978-3-319-42345-6_1.
Appendix A Proof of Lemma 4
We call a random adjacent swap that decreases the number of inversions, , during the insertion-sort round a good swap.
Break the time interval for this round of insertion sort into epochs, each of size between and (this is possible because , by Lemma 3) and let be the start of epoch . Denote the length of epoch by . Given the values of and at , only the elements in the ranges and will be involved in insertion sort comparisons during epoch . This set of potentially compared elements has size at most .
Consider the set of adjacent disjoint 4-tuples in , for . There are of these tuples and so there are at least tuples whose elements cannot be involved in comparisons during a given epoch. Call such a tuple of elements an untouchable tuple.
We now examine just the swaps during one specific epoch. Let be the number of random adjacent swaps that swap with for . Let be independent identically distributed Poisson random variables with parameter for . Note that for large enough . Let be the function that counts how many there are such that the tuple is untouchable and , , and .
By the Poisson approximation, Lemma 1, for any ,
As previously stated, there are at least untouchable tuples. Because the are independent, for an untouchable 4-tuple ,
is the sum of at least
independent indicator random variables that each have probability at leastof being . Thus . Therefore, by a Chernoff bound from :
Therefore, within each epoch of the insertion sort round there are at least untouchable tuples where the middle pair of indices are swapped twice and the other two pairs are not swapped, with exponentially high probability. In each of these tuples one of the two swaps must have been a good swap.
So we can conclude that for each epoch, with exponentially high probability, there are good swaps. Because there are at least epochs, setting implies there are at least good swaps during the entire insertion sort round, with exponentially high probability. ∎
Appendix B Counter swapping
Given a list, , of numbers with no two equal numbers, the Cartesian tree  of is a binary rooted tree on the numbers where the root is the minimum element , the left subtree of the root is the Cartesian tree of , and the right subtree of the root is the Cartesian tree of . In our analysis, we will primarily consider the Cartesian tree of the simulated final state at time where in the frozen-state permutation . We also choose to include two additional elements, and , for boundary cases. Figure 4 shows an example Cartesian tree we might consider. The Cartesian trees we consider are only for the sake of analysis. They are not explicitly constructed.
Recall that is the number of bad inversions, which is the sum of the blocked and stuck inversions. For the purposes of bounding , we conceptually associate two counters, and , with each element, . The counters are initialized to zero at the start of an insertion sort round. When an element is increased by a random swap in , we increment and when is decreased by a random swap in , we increment . After the random swap occurs, we may choose to exchange some of the counters between pairs of elements.
Maintaining Invariant 1 in the face of the random swaps in can be difficult, because new minima could be added to the path or old minima could be removed from the path. To handle these challenges, we pair up each element with degree three in the Cartesian tree with a descendant leaf. First, as a special case, the element in the Cartesian tree is paired with the element. To find pairs for the degree-three elements, we consider traversing the tree in depth first order starting at the root. Below a degree-three element in the Cartesian tree there are two subtrees. When a degree-three element is encountered in the traversal, the larger of the maximum leaf element in the left subtree and the maximum leaf element in the right subtree will have already been paired up. So we pair the degree-three element with the unpaired (and smaller) of the two maximum leaves (Figure 5). For a degree-three element, , denote the index in of its pair with . We enforce the following stronger invariant:
Invariant 2. For every element with degree three in the Cartesian tree, .
Invariant 2 implies Invariant 1, because each minima along the path is either paired with the maximum leaf element in its left subtree if it has one.
We now consider how to maintain Invariant 2 after each random swap in . Suppose and are the swapped pair and for now assume neither is the active element. After the swap and and the two counters and are incremented. However, the slight upward and downward movement of elements may have changed how elements are paired up either by a structural change in the Cartesian tree or exchanging the relative value of two leaf elements. There are several cases to analyze based on how the random swap affected the modified Cartesian tree.
First we observe that if the random swap did not affect the pairing of elements, then the incrementing of counters maintains the invariant. For example, if has a pair , then is increased by one and if there is an element with , then increased by one. Each of these increases are offset by the incrementing of and respectively.
If the random adjacent swap did affect the pairing of elements, then either and are adjacent in the tree or and are leaf elements with least common ancestor . In this second case, there is an ancestor of paired with before the swap which is paired with after and is paired with before the swap and is paired with after. For both pairing changes, the distance between the paired elements is unchanged, but the Inc counter of the leaf element in the pairs may be incorrect. So we exchange and .
In the case where and are adjacent in the tree, before the swap is the parent of and afterwards is the parent of . When this happens, if either or are unsorted elements, then both elements must lie on the minima path and the swap simply exchanges their order on the minima path. So while there is a change in the tree structure, there is no change in the pairing of elements.
We can now assume both elements are semi-sorted which leads to some case analysis based on the degrees of and which determines how they are paired with other elements. In these cases, the random swap acts almost like a tree rotation.
If and both have degree three, then together there are three subtrees below and . For the largest elements in these three subtrees, one is paired with , one is paired with , and the third is paired with an ancestor of and . After the random swap, the ancestor will have the same paired element, but and may have had their pairs exchanged. In this case, to maintain our invariant if the pairings changed, we exchange and .
This case is shown in Figure 6.
If either or has degree three and the other has degree two, then there are two subtrees below and in the subtree. Out of the two maximums in the subtrees, one is associated with whichever of and has two children and one is associated with an ancestor of and . Notice that when a swap happens, the degree of and will not change if there is a subtree “between them” i.e. there are descendants of and with index between and (or equivalently ).
When there is no subtree between and , then the swap exchanges the degrees of the two elements. In this case, to maintain the invariant we also exchange and .
If has degree one and has degree three, then there is only one subtree below and . Because , that subtree’s maximum must be larger than . So . After the swap, this pairing relationship is destroyed, because both elements will have degree two. In this case, no additional work is needed to maintain the invariant.
If and both have degree two, then there is only one subtree below and . Again we condition on whether or not there is a subtree between and .
If there is a subtree between them, then the swap simply reorders and on the path leading to that subtree causing no change in pairings and maintaining the invariant.
When there is no such subtree, after the swap, one of will now be a leaf, will have degree three, and . In this case, a new pairing relationship was created between and . The swap incremented and so and the invariant holds.
If has degree one and has degree two, then there are no subtrees below and . After the swap, they will switch which element is the leaf. An ancestor was paired with and is now paired with . In this case, to maintain the invariant we exchange and .
When the random adjacent swap in involves the active element, the affect on the Cartesian tree can be somewhat more complicated. Issues might arise because is not yet slotted into its simulated final horizontal position in the Cartesian tree. We need to make sure the horizontal movements of the active element do not invalidate the invariant. Suppose there is a maximal index such that , i.e., index is where the insertion of will stop. When there is no such , will be inserted at the front of the list and so we set to be . If swaps with an element outside the range , then no horizontal movement of will occur and we can handle the case as though is semi-sorted.
So suppose is swapped with with and before the swap. After the swap, will be moved immediately to the right of in the Cartesian tree and is the right child of . Because is smaller than for , must be the right child of before the swap. So has degree two and is unpaired before the swap.
If had a right child before the swap, then now subdivides the edge from to its old right child and has degree two. So the invariant is maintained.
If had only a left child before the swap, then is now paired with , which is a leaf after the swap. The invariant requires . This inequality is satisfied, because the swap incremented .
If was a leaf paired with before the swap, then is now paired with . Exchanging the Inc counters for and guarantees the invariant is maintained.
Now we consider the final case where is swapped with with and before the swap. Because , . Additionally we observe that is the right child of in the Cartesian tree before the swap. After the swap, is the right child of and has degree two. So is unpaired after the swap.
If had a right child before the swap, then now subdivides the edge from to its old parent and has degree two. So the invariant is maintained.
If is a leaf and has a left child, then was paired with before the swap. After the swap, and both have degree two with subdividing the old edge between and its parent.
If is a leaf and does not have a left child, then there is some ancestor paired with . The pairing will switch to after the swap. Exchanging the Inc counters for and maintains the invariant.