Loops are one of the most widely used programming constructs featured in almost all programming languages. A loop is an productivity amplifier. With nominal overheads (e.g., state-registering variables, etc.), the static body of a loop can be reused for unlimited number of times.
A key building block as loop, one would suppose its best practices should have been widely known. On the opposite, however, the best practices for loop has been so largely ignored that haphazardly constructed loops with duplication issues is not uncommon even in production code. Common problems in loop programming include, but are not limited to, duplicate code, nested loops, leaky loop variables, and oversized initialization. I will explain each of them next.
Duplicate code here refers to duplication between code in the loop body and code before (after) the loop. It is the biggest problem in loop programming and is the most common root causes for bugs. Duplicate code anywhere is bad. But duplicate code of this type is harder to realize or get rid of.
Uses of nested loops are sometimes controversial. Many readers are ready to argue about this. Anyhow, what is wrong with nested loop? For almost all cases, nested loops bring complexity rather than convenience, obstruct readability rather than facilitating it. It turns what could have been coded as different components into monolithic mess and discourages code reusing. deepen the coupling rather than reducing it and discourage code reusability rather than encouraging it.
Loop variables refers to variables used to register loop state. Many developers rely on exposed variable(s) to implicitly pass information from loop to subsequent program. However, uncontrolled exposure of loop variable to subsequent code is a violation of the encapsulation principle. While sometimes convenient, it usually does more harm than good.
Loop initialization is the prelude code needed for instantiating initial loop state. It should be small and light-weight. rather than light-weight, succinct ones. However unseasoned programmers may code disproportionally heavy initializations. With due skills and perseverance, loop initializations can be made succinct one-liner.
It is beyond the scope of this article to address all the topics. Instead we will focus on duplicate code and nested loops. We will present two pillar loop programming techniques—‘loop rotation’ and ‘Nested loop thinning’ which I found are effective in fighting against above programming foes. These are the two pillars for loop programming. Proper use of them help developers avoid commonly made mistakes.
We have tried these techniques on several case studies. Particularly, we will apply them to simplify traditional quicksort algorithm to prove the effectiveness of these techniques—the implementation of quicksort algorithms. First invented in 1960, quicksort has been studied and analyzed well over the years[1, 2, 3, 4]. As one of the earliest ‘divide-n-conquer‘ algorithms, quicksort has become the de facto sorting algorithm in practice for its excellent expected performance. Recently named one of the top algorithms of th century, quicksort has a profound influence on the history of programming.
We will walk you through the multi-stage refactor process that leads to the discovery of a brand-new implementation of Hoare’s quicksort algorithm. We use Python as the working language in this article.
2 Loop Rotation
Coming with the productivity amplification power is the coding complexity of loops. There are two key observations about loops. First, a loop seldom lives in vacuum. Loops are often used as an embedded subunits in a program just like an organelle in a living cell. So it has to get along with its neighbors. As a result, a large part of loop programming is to “fit it in”. Secondly, there are many moving parts in a loop construct and they are tightly coupled, changes in one necessarily demand changes in others. As we discussed in section 1, the biggest problem with loop programming is duplicate code in and around the loop. The foremost goal in implementing loop is to reduce duplicate code. ‘Loop rotation’ is an important technique for that purpose.
It is beneficial to visualize a loop construct as a spool. Loops are to program as spools are to yarn—just as spools are used to organize yarn, loops are invented to organize program. Behind this analogy is an important symmetry with respect to circular shifts—the rotational symmetry shared by them. Aside from the point of entrance and one or more exits, a loop or a spool may be mathematically represented by a circular sequence, that is, a sequence with rotational symmetry. One can thus take this rotational degrees of freedom to one’s advantage to decide where to enter and exit the loop. Coalescing duplicate code in a loop is analogous to winding up loose and messy yarn into a tidy spool (Figure 1).
Let’s take a board game as an example to illustrate loop rotation. Imagine that the following subroutines are ready to use: Init: Initialize game; BD: Draw board; PR: Print legends and prompts; UI: Take user inputs; EX: Execute user inputs; CM: Compute moves; PO: Poll game status; End: End of game. 1 shows pseudo-code for the driver program. The duplicate code between the loop and in the vicinity of the loop is conspicuous. With a “loop rotation” procedure (to be explained below), the code may be refactored into 2.
Looking at Figure 2, as we rotate the loop, the duplicate code can be aligned and coalesced line by line. As can been seen in (a), the last statement in the loop and the last statement before the loop are verbatim duplicate and aligned. As such, we ‘roll up’ the spool so that the two can be coalesced. This process can be repeated until all the duplicates are coalesced ( (b)).
Getting rid of duplicate code is one of the main application scenario for “loop rotation”. Another application scenario for the “loop rotation” procedure is for the effect of shifting code within a loop, for example, to move a portion of the code from the beginning to the end. This is often called “reverse rotation” of the loop. As a side effect, doing so will result in duplicate code. Given that this prepares ways for subsequent refactors, this adverse effect is often paid off with larger optimizations. In practice, one may start with the more malleable “while (true)” or “for (; ;)” loops which helps one focus on getting a correct program first. Once a working and flexible code has been secured, the ‘loop rotation’ technique can be used to fine-tune the program.
3 Nested Loop thinning Technique
Every developer writes nested loops now and then. Many times, nested loops appear compelling and inevitable. While they may solve our problem, nested loops inflict on a program unnecessary complexity, obstruct code readability, and bring in ‘soft duplication’. The presence of nested loops also thwart optimization at the compiler level. For these reasons, explicit use of loops at high level programming should be avoided altogether. Moreover, getting rid of nested loops itself may not seem significant improvement. The optimizations that are made accessible after getting rid of nested loops dwarfs the improvement brought about by getting rid of nested loops itself. Like ‘Candy Crush Saga’ game at certain critical point, a single move may unlock an avalanche of advantageous moves.
In this section, we are to present a process that coalesces nested loops into single-layered loops, which we will name as ‘Nested loop thinning technique’. For sake of argument, the body of the outer loop is divided into three sections—pre-inner-loop section, inner-loop section, and post-inner-loop section—depending on their relative position in respect to the inner loop (as shown in Figure 3). Cramming the functionalities of nested loops into a single loop is no easy task. As a price, the process almost always results in one or more of the following:
extra conditional constructs
extra tests added to existing conditional constructs
more dynamic loop pacing
more boundary-condition-handling logic
auxiliary data structure, such as queue or stack
The gist for nested loop thinning process is as follows:
Loop rotation to shift around components in the loop body
Reconstruction of loop body using conditionals
But we will go through them one by one. One will see this pattern again and again for nested loop thinning in practice.
First, some preparatory measures may be taken beforehand to reduce the friction during the refactor process. Compound statement is a good example to be dealt with in this step. Compound statements may get in your way of refactoring for multiple reasons. The most obvious one is that a compound statement need to be broken up and sent to different places after refactoring. State-changing, non-idempotent conditional expressions, such as if (--i < 0), are even more lethal because each evaluation of the condition ratchets the state of the loop. For example,
shall be expanded to
before the start of loop thinning refactor. Other types of compound statements, such as “if v := a[i]; v < LIMIT” in Golang, shall be preprocessed similarly. Because of the structural similarity between while-loops and conditionals. while-loop readily lends itself to “nested loop thinning” process. For this reason, for-loop are often converted to while-loop during preprocessing.
In the second stage, one is to move pre-loop statements, if any, out of the way. To that end, reverse loop rotation may be used to unwind the pre-loop statements (see section 2). The second step depends on the inner loop construct.
Finally, it the reconstruction of the loop body using conditional rather than nested loops. If the inner loop is an unconditional loop as ‘while True’, break statements (or similar) are almost always present and most likely in a conditional statement somewhere in the loop body unless it is intended to be a non-typical loop construct. One should replace the break statement with the post-inner-loop statements. The inner loop can then be stripped away. Otherwise, if the inner loop comes with a non-trivial termination condition, then the inner loop can be converted to a conditional directly, while if, while the post-inner-loop statements are wrapped away in an else-clause of it.
Regardless of the venue taken, new conditional statements are inevitably formed or extended and ‘cascading conditional construct’ are the best way to organize them. ‘Cascading conditional construct’ consists of ordered sequence of exclusive conditional statements such as “if .. elif .. else”. For more information, please refer to relevant chapters in Reference . Throughout the process, one shall pay special attention to execution-path-shunting statements, such as break and continue, if any.
After ‘nested loop thinning’, some cleanup may be performed to comply with convention, code style, or just for cosmetic reasons.
One disclaimer is that the ‘nested loop thinning’ process does not always prevail. There are cases where the process is not applicable. Certain criteria must be met for ‘nested loop thinning’ to be applicable. First, there must not be intermediate layers, such as conditional, between the inner and outer loops. Secondly, execution-path-shunting statements, such as break, cannot be present in the pre-inner-loop section. In what follows, we are going to demonstrate application of ‘nested loop thinning’ technique on the quicksort algorithms.
4 Case Study: quicksort
Quicksort is one of the pivotal sorting algorithms widely used by modern software. The Hoare’s scheme was the first partition scheme that came with the original invention of this algorithm[1, 2, 3]. Traditionally, Hoare’s partition scheme has been implemented with nested loops. Later one, Lomuto’s partitioning scheme was invented whose implementation is much simpler with only one loop. However for certain edge cases, Lomuto’s quicksort algorithm does not perform well. It is natural to ask if one can implement the Hoare’s partition scheme with the simplicity of or close to the Lomuto’s. Armed with the loop programming techniques presented in this article, let us give it a try. First let us lay the foundation of the implementations of the quicksort algorithm.
4.1 Recursive implementation of quicksort
At the high level, a recursive quicksort implementation may be as follows
where arguments s and e are the starting and ending pointers to the input array. This function invokes ‘int* part(int* s, int* e)’ which is a function stub for array partition which will be discussed in detail below. For single-element arrays, quicksort function is no-op which is a base case. For naïve implementations, this the only base case and it is sufficient. But for more sophisticated implementations or to reduce partition overhead, more base cases are used to address under-sized arrays (e.g., arrays of size but ).
In plain English, this is how quicksort algorithm works:
If the array contains fewer than elements (the base case), return as is. Otherwise, invoke partition function to partition the array into two subarrays and an element (the pivot). Each subarray is subject to the quicksort function again so on and so forth until they are all reduced to the base case. At the return of the function, the entire array is sorted.
The implementation of the partition function is key to the quicksort algorithm. The partition function does three things:
Pick a pivot from among the array and set it aside;
Use pivot as a benchmark, partition the rest of the array into two subarrays with smaller (or equal) ones on the left and greater (or equal) ones on the right;
Put the pivot element back in between the subarrays.
The subarrays resulted from partition may not be equal sized which is called partition skewness. Partition skewness has an adverse impact on the performance of quicksort algorithm. In all practical implementations of the partition function for quicksort, some type of pivot selection strategy is needed to prevent partition skewness. Common and proven practices are random selection or “median-of-three” technique[3, 8]. Of course, to apply the “median-of-three” technique, the array must exceed a minimum size.
With this said, we are ready to discuss implementation of quicksort and its partition function. Admittedly, implementing quicksort is quite tricky. Among the quicksort partition schemes, best known are Hoare’s scheme and Lomuto’s scheme[1, 4]. The main difference between them lies in how the array is traversed. In Hoare’s scheme, two pointers, one from each end of the array, step toward each other; whereas in Lomuto’s, two pointers, each on its own pace, start off the left end of the array and step rightward.
Among the quicksort algorithms, there are two main variants—those by Tony Hoare and by Nico Lomuto, respectively. Hoare’s quicksort scheme has robust and optimal performance but its implementation has been quite involved. Lomuto’s scheme, on the other hand, is straightforward to implement and easier to follow. However its performance may degrade catastrophically for certain edge cases. Comparing the two, one cannot help but wonder if there is an implementation that is as robust and performant as the traditional implementation of Hoare’s quicksort algorithm and at the same time as succinct as that of Lomuto’s. That is going to be the focus of rest of this article.
4.2 Implementation of Lomuto’s Partition Scheme
Let’s start with the relatively simpler Lomuto’s partition scheme. In Lomuto’s partition scheme, the two pointers have distinct tasks. The one running in front, variable i, is responsible to discover out-of-place elements. The one behind, p, guards the partition boundary. When i discovers an out-of-place element, p makes room and places it by a swap and the partition process continues. The code is shown in 4. Once one understands the code, implementation becomes highly consistent and intuitive. One seldom fails implementing even for a customized applications.
Notably, implementing this partition scheme only needs one loop. However the simplicity is no free lunch. In fact, for certain edge cases, Lomuto’s partition scheme suffers severe performance penalty, e.g., arrays with a large number of identical elements, in which the Lomuto’s quicksort algorithm degrades close to quadratic runtime. The root cause leading to this degradation lies in the asymmetric traversal of the array which inevitably leads to partition skewness. With the Lomuto’s quicksort partition function, let’s come back and study the more sophisticated implementation—the symmetric Hoare’s partition scheme.
4.3 Implementation of Hoare’s Partition Scheme
Invented along with the quicksort algorithm, the Hoare’s partition scheme predates Lomuto’s historically[1, 2, 3]. Because of its symmetric traversal, Hoare’s partition scheme successfully avoids the drawback of Lomuto’s.
Visually speaking, Hoare’s partition scheme employs two pointers, s and e, starting off the opposite ends of the array, push through the array toward each other, given that a pivot element has been placed at the beginning of the array. As in Lomuto’s partition function, these pointers also stop at ‘out-of-place’ elements. When both stop, the ‘out-of-place’ elements are swapped to where they belong. Then the pointers are on their way again so on and so forth until they meet or cross. At last the pivot element is swapped into its final position and a pointer to this final position is returned.
While it appears a minor change from Lomuto’s scheme, the Hoare’s partition comes with immense implementation complexity. So much so that for a long time how Hoare’s partition scheme work remained an enigma. There are so many changing variables and so much coupling among them that once in a while, each attempt of implementing it may end up with a different solution. Even worse than that, when something goes wrong, one is often clueless as to what is wrong. Also, it is extremely hard, if not impossible, to devise a test case to hit an elusive bug.
But in stark contrast to the numerous slightly differing implementations, all known implementations have so far unanimously used nested loops: one outer loop and two sequential inner loops. This feature is so commonplace that it has become the stereotype of quicksort.
With the Hoare’s partition scheme, a commonly found implementation for the partition function is as follows.
While we have given an outline of the working of Hoare’s partition, many choices remain to be made in regard of “how, when, and what”. As such, pitfalls lay in wait every now and then throughout the implementation process. We will leave the discussions of the implementation process of 5 encountered during this implementation in Appendix A.
4.4 Thinning of Nested Loops
Now we are going to use the techniques presented earlier to transform the traditional implementation of Hoare’s partition scheme and get rid of the nested loops.
The immediate difficulty is how to take apart the densely packed conditional construct:
The loop conditions here are awkwardly complicated and make the ’nested loop thinning’ (outlined in section 3) nontrivial. The main difficulty lies in the dilemma—when condition suits, we need to switch to alternative execute path; but when we do, the pre-incremental statements would have ratcheted the state variables ‘(s, e)’ one step too far.
Measure must be taken to break up the pre-incremental statements before we can proceed any further. We follow a two-step conversion procedure: first unfold to ‘do-while’ and then, in turn, rotate to ‘while’ as shown in 6 and 7.
After these changes, our code becomes the listing on the left-hand side below. We have made slight adjustment so that the ++s and –e statements are gathered together into the pre-inner-loop section.
After these preparation steps, we are ready to follow the prescribed ‘loop rotation’ procedure. Namely, relocate the pre-inner-loop statements (shown as listing on the right-hand side above), convert inner loops into cascading conditionals, wrap up the post-inner-loop statements into an else clause, and other cosmetic changes (refer to LABEL:*sec:loop-thinning-technique “LABEL:*sec:loop-thinning-technique”111Note that here we need to covert the second inner loop to an ‘else if’ because the first inner loop is converted to an ‘if’ clause).
Shown in 10 is the Hoarse’s partition function after completing the ‘loop thinning’ procedure. During the refactor process, we relied heavily on the ‘loop rotation’ technique and the ‘nested loop thinning’ techniques. We also consciously employed skills for the construction of cascading conditionals. Comparing with where we started off 5, the new quicksort implementation 10 retains the optimal and robust runtime as Hoare’s algorithm but consists of just one loop as the Lomuto’s partition function does. That’s almost too good to believe. Our quest for a simple and performant partition scheme finally pays off.
Additionally, one may use sentinels to simplify the cluttered conditions in the cascading conditional constructs of 10.
The end result is listed below.
Note that this partition function requires at least elements in the array to work properly. Interested reader may refer to a more detailed discussion in Appendix B.
Note that implementations 10 or 11 for quicksort is just another way to implement the quicksort algorithm with Hoare’s partitioning scheme. Their runtime complexity is expected to be the same as that of the traditional ones. As such, we have designed experiments to test this hypothesis. To prevent pivot skewness, we use the “median-of-three” technique in all the quicksort implementations use for this experiment[8, 3].
Table 1 shows the runtime analysis and comparisons. The simplified implementation indeed comes with an overhead and is thus slower than its traditional counterpart of the Hoare’s quicksort program. For sorted data sets (either ascending or descending), there is a slowdown. But for randomly shuffled data sets, the slow down is consistently around . The experiment data seems to indicate that the simplified implementation of Hoare’s quicksort algorithm shares same runtime complexity as its traditional counterpart. Their performance may differ by a multiplier.
|Integer arrays in ascending order|
|array size||Traditional||Simplified||Percent difference|
|Integer arrays in descending order|
|array size||Traditional||Simplified||Percent difference|
|Integer arrays randomly shuffled|
|array size||Traditional||Simplified||Percent difference|
In this article, we presented a couple of techniques for optimizing loop constructs in high-level programming languages. Taking advantage of the circular symmetry of loop constructs, loop rotation may be applied to a loop to either reduce code duplication (forward rotation) or to shift certain part of the code within the loop body (reverse rotation). Another technique is loop thinning for simplifying nested loop complications. These are two of the empirical techniques that helps programmers achieve software development best practices. As an example, we applied these techniques in simplifying a traditional implementation of Hoare’s quicksort algorithm. We provided the simplified implementation in C++. More generally, the programming techniques we developed in this article are applicable to all programming languages that supports loop and conditional constructs.
Appendix A Hoare’s Partitioning Functions
Below I compiled a partial list of frequently-made bugs.
In pivot election, failure to swap pivot element to the beginning of array
For the pointers s and e, there are two hesitating options: ‘check-then-increment’ or ‘increment-then-check’. For the implementation shown in 5, first option leads to infinite loop for certain cases.
Also a choice between ‘<= pivot’ and ‘< pivot’ for s, ‘>= pivot’ and ‘> pivot’ for e. Incorrect choice may inadvertently cause pointer incrementation to be skipped under obscure circumstances which will, in turn, cause infinite loop (Remember that under no circumstances, should either pointer stops approaching each other in any outer loop iteration before they meet or cross.)
In the second inner loop for pointer e, failing to check boundary condition causes ‘out-of-boundary’ exception. For certain solutions, the correct condition is pivot < e. Incorrect boundary condition, such as s < e, again cause pointer e to stall in the middle of the right partition. This will cause the pivot is placed at the wrong place at the return of function.
Outer loop termination logic also needs deliberation. The key decision to make is ‘where to break or return’ rather than ‘when to break or return’. For 5, the choice of location of return at the end of the loop body was made after quick a few failed attempts.
Fail to swap the pivot element to its final position. This step often poses a stumbling block for beginners as well as other unsuspecting developers.
Choice of subarray semantics with respect to inclusiveness/exclusiveness of end pointers when using start and end pointers to denote a subarray.
Any of the items in this list can easily take a good chunk of debugging time.
A large part of the complication of the problem comes from the fact that this implementation consists of many moving parts and the design decisions for them are intimately coupled—changing one would necessitate corresponding changes in others. For example, suppose we are going to change the subarray semantics of e from exclusive to inclusive, we know that the return pointer will have to be different. But what would it be changed to? s, s+1, e, or e-1? Also, with these changes, the recursive calls in 5 line - need corresponding changes. What would that be? Would it be the LHS or RHS below?
Also one may have observed, most of these frequently-made bugs comes with a multiple choice question. The collection of them form a tree structure. Each of the leaf node of the tree structure represents either garbage code or a legitimate solution. From an existing solution, one may variational tunnel to nearby solutions. For more information on how more related implementations may be discovered, please refer to Appendix A.
Appendix A where we listed a number of implementations variations for the partition function. While not all of them can be simplified into single-loop implementation, we have had success with a few. Exactly which one can and which one cannot or why are not completely clear and yet to be investigated with.
All these listings are to be understood with some pivot-selection mechanism to avoid pivot skewness.
Appendix B Sentinels in quicksort
For stark effect of the sentinels, we will first fix the traditional implementation of ‘Hoare’s partition scheme’: 3 gives the outline of the program and 5 gives a fully implemented partition function. As mentioned earlier, one can get involved in the thick of implementing ‘Hoare’s quicksort algorithm’. Not only the problem itself is tricky but also the way we implement it. By picking Hoare’s quicksort scheme over its alternative (such as Lomuto’s), we insist on the robust expected runtime.
We attempt to get rid of the boundary checking using sentinels. These boundary checking are there to ensure that the pointers ‘s’ and ‘e’ do not slip off the ends of the array. In large arrays, these boundary check operations may get in the way of performance of the algorithm. But a large part of the reason is to remove clutter from the code.
The main idea is to make boundary checking redundant by effecting some artifacts that catch the pointers before the ‘out of boundary’ error happens. This is exactly what sentinel is good at! By deploying sought-after values, the ‘sentinels’, at the ends of the array before the control hits the loop, the following will happen:
the pointers, ‘s’ and ‘p’, would have to stop when they hit the ends of array;
the subsequent logic will guarantee the termination of the outer loop and thus guarantee the inner loop would not be executed again.
What are the sought-after values by ‘s’ and ‘e’? The out-of-place values! Particularly, s is looking for values that are ‘ the pivot’; e is looking for values ‘ the pivot’. So then instead of randomly picking one value for pivot, we now randomly pick values, the median of which be elected as the pivot as usual, the minimum be deployed to the left end, and the maximum to the right end. We group these operations into a function called ‘init_swap’:
Now back the function ‘part’. As we mentioned before, the deployment of sentinels in function ‘init_swap’ makes the conditional expressions ‘s < e’ and ‘pivot < e’ semantically redundant. They can now be safely removed. Below is the function ‘part’ after all the refactoring is done.
Comparing with the implementation 5, this code cleans out clutter in the loop conditions in the inner loops. We ‘float’ the control all the way through the termination of the program on a ‘touchless’ rail made by sentinels, without ever needing to check boundary condition. Of course, the onus is shifted to the initialization before the loop. Comparing with the inner loops, that section is non-critical. Not only that the code is made less cluttered, but also the number of such checks is reduced from to . More importantly, by not checking boundary conditions at the most critical section, we avoid bugs that would otherwise cost us many hours of debugging time.
So by use of sentinels on the Hoare’s quicksort algorithm, traditional nested loop implementation or the single-loop implementation 10 alike, we shifted the complexity among the nested loops to a non-critical part of the program, effectively reduced its coding complexity. Applied to the latter, we arrive at a new level of simplicity for the implementations of Hoare’s quicksort (as shown in 11).
-  Charles Antony Richard Hoare. Partition (algorithm 63); quicksort (algorithm 64); find (algorithm 65). Communications of the ACM, 4:321–322, 1961.
-  Charles Antony Richard Hoare. Quicksort. The Computer Journal, 5:10–16, 1962.
-  Robert Sedgewick. Implementing quicksort programs. Commun. ACM, 21:847–857, 1978.
-  Jon Bentley. Programming Pearls. ACM, New York, NY, USA, 1986.
-  Jack Dongarra and Francis Sullivan. Guest editors introduction: The top 10 algorithms. Computing in Science & Engineering, 2:22–23, 2000.
Machine Learning Explained: Vectorization and matrix
Enhance Data Science, 2018.
-  Shoupu Wan. Lean Code. TBD, approx. 2019.
-  Robert Sedgewick. The analysis of quicksort programs. Acta Informatica, 7:327–355, 1977.
-  Jon Bentley. The most beautiful code I never wrote. O’Reilly Media, 2007.
-  Burak Karakan. Benchmarking sorting algorithms. https://github.com/karakanb/sorting-benchmark, 2017.
-  William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C (2Nd Ed.): The Art of Scientific Computing. Section “Quicksort”. Cambridge University Press, 1992.