1 Introduction
Finding a longest palindrome substring (LPS) in a given string is a fundamentally important question as it has widespread applications in mathematics, physics, chemistry, genetics, music, etc. [5, 6, 4, 3, 2]. To one who is familiar with the song “Rhythm of the Rain”, the prelude music might be very impressive. That is an example of musical palindromes. In genetics, palindromic sequences has an important capability—forming hairpins [1]. It is amazing to learn that palindromes had played such an important role in life from the very beginning. But here I pick the LPS problem for two reasons. First, this problem is closely related to the study of symmetry. Often times, uncovering the underlying symmetry is the key for great solutions. This problem exemplifies how conscious application of mathematical analysis can help devise an algorithm. Secondly, this problem is a perfect case of study for demonstrating how to refactor messy and monolithic code with bloating duplications into a succinct and modular solution free from duplications stepbystep. Most of the techniques are discussed in depth in book [7].
Here is the structure of this article. In section 2, we will set the problem statement. In section 3, the reflection symmetry with necessary mathematical context will be explained with aim at the application toward the LPS problem. In section 4, Manacher’s algorithm is presented together with some intuition. Then section 5 we will discuss existing solutions with the stringaugmentation preprocessing. In section 6 and section 7, we will present the new approach of index mapping to implement Manacher’s algorithm. Also in these sections, we will perform multistage refactor process that eventually leads to a modulalr solution to the LPS problem with high readability. Finally in section 8, we will put all the implementations presented in this article to test. We will compare the performance test result for different approaches. All solutions provided in this article will be implemented in Java.
2 The problem Statement
The LPS problem takes various forms in the literature. For the sake of this article, we state the problem as
“Given an input string, find the longest palindromic substring in it (or one of them if there are more).”
According to MerriamWebster dictionary, a palindrome is “a word, phrase, or sequence that reads the same backward as forward”. The length of a palindromic string can be either odd or even. Accordingly, we may classify palindromic strings as odd or even. For an odd palindrome, its center of symmetry,
e.g., palindromic center or simply center, falls on a character. For a nonempty even palindrome, its center falls between two characters, which in this book will be referred to as left center and right center, respectively. Obviously, an emtpy string is also palindromic—it is the trivial case. A palindromic substring (PSS) of a string is any substring that is a palindrome. For a string of length , there are palindromic centers, albeit some of them may be trivial. The sole PSS of an empty string is trivial. The first and last PSS’s of an nonempty string are trivial. Apparently for a specific nontrivial palindromic center, there may be a series of cocentered palindromic substrings. We call the longest among these cocentered palindromic substrings ‘prime palindromic substring’. Without loss of generality, we will limit our discussion on prime palindromic substrings only.To some, solving the problem is not “hard” so to speak if optimality is not concerned. One possible solution, for example, may be that
Iterate through each possible center and for each center, calculate the length of PSS. To calculate the length of PSS at a specific center, one can dispatch two indexes off the center outwardly in opposite directions symmetrically. If, at any step, a mismatch is encountered, stop. The substring lies between the two indexes.
The runtime complexity of such a naive solution is . The difficulty about this problem is how to beat the quadratic runtime. In his paper of 1975, Glenn Manacher discovered an algorithm with linear runtime. It was later found that his method works not only for prefix PSS but for all PSS’s. This algorithm is now the socalled Manacher’s algorithm [5]. We will dedicate the next few sections to get a thorough understanding for this algorithm. First, we need a little bit of math about symmetry.
3 Reflection symmetry
Reflection symmetry, a.k.a., mirrorimage symmetry ^{1}^{1}1https://en.wikipedia.org/wiki/Reflection_symmetry refers to spacial invariance under a reflection transformation. A reflection transformation is the operation that transforms coordinates to their mirrorimage w.r.t. a fixed point, which we will refer to as the axis of symmetry or center of symmetry. In onedimensional space (D), a coordinate, , and its transformed coordinate, w.r.t. axis are related by equation:
(1) 
As shown in (a), axis of symmetry partitions the entire D space into halfspaces about itself—one on the left and one on the right. There is a onetoone mapping between the points in the halfspaces. Axis does similarly ((b)). It is truly magical when two axes of reflection symmetry are present near each other along the axis. By repeatedly applying the reflection transformation, one may find infinitely many axes of symmetry alternately along the axis, and collectively they partition the entire space into periodic regions with period , where is the distance between the two axes of symmetry (see (c)). This effect may not be unfamiliar to you if you have ever stepped in between two parallel mirrors—an array of clones of ‘you’ appear, alternately facing toward and away from you, aligned and coordinated. This symmetric configuration in a discretized D space resulting from multiple reflections is the key intuition to the LPS problem.
(d) shows an infinite palindromic string. In reality, however, the aforementioned symmetry does not exist, as there is no infinite space or string. Nevertheless, the argument still holds for the finite string in the overlapping regions. Of prime interest to us are a collection of ‘crowded’, overlapping PSS’s. Under the wings of some large PSS’s, some shorter palindromes may take shelter. On the other hand, the larger PSS that cloaks over the shorter ones will project the latter’s mirror image to the opposite wing, because of reflection symmetry (see Figure 2). This reflective projection may be applied recursively as many times as there are enclosing palindromes. As a result, a substring may be projected to its mirror image , which, in turn, may be projected to its image and so on.
4 Manacher’s Algorithm
Equipped with this understanding of the reflection symmetry, we are in a better position to crack the mystery of Manacher’s algorithm. Manacher’s algorithm leans heavily on cached PSS’s. The reflective projection relates the tobecalculated palindromic center with its mirror image in the cache and this is the key step to avoid repeated character comparisons.
Taking string “bananas” for example as shown in Figure 2, knowing that PSS has length and that PSS mirrors (part of) PSS to (part of) PSS, we can skip all but the outermost pairs, which are ‘n’ and ‘s’. Upon seeing that they do not match, the length of PSS is finally pinned at . So with one additional character comparison, we obtained the length of PSS. That is where savings come from.
To sum up, let us iterate through the string from left to right and cache the result in an array, e.g.,
index:  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10,  11,  12,  13,  14 
length:  0,  1,  0,  1,  0,  3,  0,  5,  0,  3,  0,  1,  0,  1,  0 
At each step, we also keep a reference PSS which reaches the farthest right. When examining a new center, we first check its mirror image with respect to the reference. Depending on the relationship between its mirror image PSS and the reference PSS, some or all character comparisons for the new center may be spared, just as in the case of “bananas”. With no more ado, lists Manacher’s algorithm.

Initialize an array pss of size 2 * n + 1 and element at index i stores length of .

Initialize refCenter = , which stores the palindrome center whose right wing reaches rightmost in each iteration of the main loop;

For each character at j = 0, 1, ..., n in the augmented array,

[label=:]

If j lies outside of pss(refCenter) to the right, calculate from scratch; update refCenter accordingly; skip to the next iteration.

Otherwise, find the mirror image, k, of j w.r.t. refCenter.

If is completely contained in PSS(refCenter), then pss(k)=pss(j);

Otherwise, we need to calculate the , but only the portion outside of PSS(refCenter), if any.

5 AugmentedString Implementations
I hope you have already grasped the gist of Manacher’s algorithm before we talk about its implementations. Traditionally the implementations of Manacher’s algorithm assumes augmenting the original string by inserting a dummy character between each adjacent pair of characters in the original string. For uniformity, we also add dummy characters at the ends (insert one dummy character in the case of an empty string). By doing so, we established a onetoone mapping between the PSS’s in the original string and the odd PSS’s in the augmented string. So for “bananas”, the augmented string would be “ b a n a n a s ” if blank space is chosen as the augmenting character. It is important that the chosen dummy character be absent from the original string. Otherwise, spurious result may result.
With the literally augmented string, implementing Manacher’s algorithm becomes straightforward. One can refer to several published implementations in different programming languages. There is a Java version by the CS department of Princeton University^{2}^{2}2https://algs4.cs.princeton.edu/53substring/Manacher.java.html. There is a Python version by Fred Akalin^{3}^{3}3https://www.akalin.com/longestpalindromelineartime. There is even a Haskell implementation along with a discussion of the algorithm itself in Johan Jeuring’s blog^{4}^{4}4 http://findingpalindromes.blogspot.com/2012/05/findingpalindromesefficiently.html and also book [4]. Lastly, an implementation of my own is also provided for the sake of reference ( 1).
6 Virtual Augmentation
Even though the stringaugmentation approach has found widespread use for the implementation of Manacher’s algorithm, this is neither convenient nor necessary. For large strings such as DNA chains in genome sequencing, it is costly to have to construct the augmented string with doubled memory footprint [6]. Furthermore, it is onerous and sometimes quite annoying to have to identify a suitable dummy character for the augmentation process. In this section, we seek a more concise and economic way to implement Manacher’s algorithm—sparing the string augmentation process.
Arithmetic  Semantic  Helper Function 

i / 2  The left center in the original string (only for even i)  
(i  1) / 2  The center in the original string (only for odd i)  
2 * i  x  Mirror image of x about the center in the augmented string  toMirrorImage 
i  pss[i]  Left bounding index in the augmented string  getLeftBound 
i + pss[i]  Right bounding index in the augmented string  getRightBound 
(i  pss[i]) / 2  Left bounding index in the original string^{5}^{5}5 This is because if i is even, length of the PSS centered on i can only be even. Conversely, if i is odd, length of the PSS centered on i can only be odd.  
(i + pss[i]) / 2  Right bounding index in the original string ^{5} 
The key is to establish an index mapping between the original and the augmented string. Consider the string “bananas” ( Figure 2) and its augmented string “ b a n a n a s ”. Let us try to establish the correspondence rules between the original string and the augmented string. First, each character in the original string corresponds to a character in the augmented string with an odd index. So there is a onetoone correspondence between the indexes of the original string and the odd indexes of the augmented string. The even indexes of the augmented string, however, corresponds to the inner boundaries between adjacent characters. Together, they establish a onetoone mapping between PSS’s of the original string and odd PSS’s in the augmented string. It is easy to see that a PSS in the augmented string are completely determined by the corresponding PSS in the original string. The added augmentation characters do not interfere at all. Therefore, if we can formulate the PSS’s of the original string in terms of the indexes of the hypothetically augmented string, we would be freed from the need to construct the augmented string in memory. We name this method “index mapping”. Accordingly, the process of relying on mapped indexes for calculating palindromic substrings are named “virtualized augmentation”. Not only does virtualized augmentation makes the doublesized memory consumption obsolete, but it also frees us from the burden of choosing dummy characters. Some arithmetic expressions and their semantic meanings have been tabulated in Table 1 for reference. Based on the idea of virtual augmentation, we come up with a new approach to implement Manacher’s algorithm ( 2). Note that the solution listed in 2 has runtime as well as memory complexity.
7 Solutions for Readability
By virtual augmentation, we resolved the memory footprint issue and the shadowy dummy character. The mission is well accomplished. But by no means should we settle here. The code in 2 is like a bowl of spaghetti noodle, isn’t it? Even though the code is divided up into three functions, its cleanliness still suffers. It takes some courage for me to read it and try to figure out what each line does in just a couple days after I wrote it. I can not imagine it would be easier for one who has just come across the code. A principle that all developers should stick to is “If it is not readable, it is not acceptable”. Our next goal is to seek a more readable way to implement it.
At a glance, the code is packed with arithmetic expressions, some for symmetry transformations, some for key look ups, etc. These are all resulted from the virtual augmentation. But if you look carefully, you may spot bloating repetitions of some arithmetic operations. Some are hard “copynpaste” of others while more belong to the category of socalled “soft duplicate” [7]. Another problem with the implementation is that it is monolithic. The same function does too many things at once, violating the Single Responsibility Principle ^{6}^{6}6https://en.wikipedia.org/wiki/Singleresponsibility_principle. A well organized solution in this context should consist of a group of meaningful, singlepurposed, and reusable functions.
So we have two tasks that are somehow related—one for elimination of duplicate code and the other to make the solution more modular. Our starting point is 2. We may approach both tasks first with an understanding of the semantics of some operations, especially those that are repeated. If it helps, we can factor them out into helper functions.
Take as an example line 2 in 2:
It may be obscure to untrained eyes. But the pattern is “Given an index, get the result as the index minus the element at the index”. So we can factor that out into a function, e.g., getLeftBound (see Table 1). All occurrences of the same logic may be replaced by a call of function getLeftBound. This alone helps get rid of x duplications. In fact, similar refactor may be performed for other entries in Table 1. By doing so, we first modularized the solution by creating succinct, easily understandable helper functions. The helper functions, in turn, may be reused to reduce code duplication. One stone for two birds. The technique is discussed in length in the book [7].
Our refactored solution—the class LongestPalindromeSolver—is listed in 3 and 4. The class has two private fields, one for input and the other output. The only point of entry is the public method longestPalindrome which helps with bookkeeping and dispatching. All others are helper functions listed in 4 and explained below.
The solve function implements the large part of Manacher’s algorithm. Methods getLeftBound, getRightBound, palength, and toMirrorImage each corresponds with an index mapping expressions in Table 1. Method isMismatch checks for pairwise character mismatch. Method substring helps construct the final result—the longest palindromic substring. Method argmax is, as suggested by its name, a quickanddirty implementation of the mathematical function.
8 Experiment
To test the performance of the implementations and catch possible regressions, we designed a simple experiment to compare the implementations listed in this article. We randomly generated strings of various lengths using alphabets as the testing benchmarks for and . To reduce error, each run is repeated three times and the average is taken. We found no strong correlation between runtime and or the length of longest palindromic substrings. The linearity of the runtime vs size of input string stands out quite obviously in Table 2 which is expected. In summary, our new indexmapping based implementations perform similarly as the approach based on stringaugmentation but is more efficient in terms of memory footprint. Where the implementation with augmented string fails, the virtualized augmentation approach still runs successfully. Besides the readability, there is also slight improvement in runtime in our modularized solution.
9 Conclusion
In conclusion, we went through the longest palindromic substring problem as a case of study. We discussed the reflection symmetry required to understand Manacher’s algorithm. We presented a novel implementation of Manacher’s algorithm that avoided the tedious and costly string augmentation with index mapping. We compared the performance of the new approach against that of stringaugmentation in terms of memory and runtime complexities. Using the techniques presented in previous chapters of this book, we refactored the monolithic solution with bloating duplication into a more modular and readable one.
References
 [1] (2008) Chromosome evolution with naked eye: palindromic context of the life origin. Chaos 18 (2), pp. 013105. Cited by: §1.
 [2] (1994) Text algorithms. Maxime Crochemore. Cited by: §1.
 [3] (1981) String matching in real time. Journal of the ACM (JACM) 28 (1), pp. 134–149. Cited by: §1.
 [4] (1993) Theories for algorithm calculation. Utrecht University. Cited by: §1, §5.
 [5] (197507) A new lineartime “online” algorithm for finding the smallest initial palindrome of a string. J. ACM 22 (3), pp. 346–351. External Links: ISSN 00045411, Link, Document Cited by: §1, §2.
 [6] (2010) Data hiding methods based upon dna sequences. Information Sciences 180 (11), pp. 2196–2208. Cited by: §1, §6.
 [7] (approx. 2020) Lean code. TBD. Cited by: §1, §7, §7.