1 Introduction
A vast literature exists on algorithms for locating regularities in strings. One of the most natural notions of regularity is that of an exact repetition (also called power or tandem repeat), that is, a substring formed by two or more contiguous identical blocks — the number of these identical blocks is called the order of the repetition. Often, the efficiency of such algorithms derives from combinatorial results on the structure of the strings. The reader is pointed to Ba15 for a survey on combinatorial results about text redundancies and algorithms for locating them.
Recently, a new notion of regularity for strings based on diversity rather than on equality has been introduced: an antipowerof order FRSZ16 (see Fi18 for the extended version) is a string that can be decomposed into pairwisedistinct strings of identical length. This new notion is at the basis of a new unavoidable property. Indeed, regardless of the alphabet size, every infinite string must contain powers of any order or antipowers of any order FRSZ16 ; Fi18 . Defant Def17 (see also Narayanan N17 ) studied the sequence of lengths of the shortest prefixes of the ThueMorse word that are antipowers, and proved that this sequence grows linearly in .
In this paper, we focus on the problem of finding efficient algorithms to locate antipowers in a finite string. While there exist several algorithms for locating repetitions in strings (see for example Cr09 ), we present here the first algorithm that locates antipower substrings in a given input string. Furthermore, we exhibit a lower bound on the number of distinct substrings that are antipowers of a specified order, which allows us to prove that our algorithm time complexity is optimal.
2 Preliminaries
Let be a string of length over an alphabet of size . The empty string is the string of length . For , denotes the th symbol of , and the contiguous sequence of symbols (called factor or substring) . A substring is a suffix of if and it is a prefix of if . A power of order (or power) is a string that is the concatenation of identical strings. An antipower of order (or antipower) is a string that can be decomposed into pairwisedistinct strings of identical length FRSZ16 . The period of a power (resp. the antiperiod of a antipower) of length is the integer .
For example, is a power (also called a square) of period , while is a antipower of antiperiod (but also a antipower of antiperiod ).
In this paper, we consider the following problem:
Problem 1.
Given a string and an integer , locate all the substrings of that are antipowers of order .
3 Lower Bound on the Number of AntiPowers
Over an unbounded alphabet, it is easy to see that a string of length can contain antipowers of order (think of a string consisting of alldistinct letters). However, somewhat more surprisingly, this bound also holds over a finite alphabet, as we now show.
For every positive integer , we let denote the string obtained by concatenating the binary expansions of integers from to followed by a symbol . So for example . We have that . Let us write .
Lemma 1.
Every string of length contains antipowers of order .
Proof.
As mentioned before, we have . Let denote the number of antipowers of order in with antiperiod .
The number of antipowers of order is at least the sum of the number of antipowers of order with antiperiod greater than . It is readily verified that if the antiperiod is such that then at every position in there is a antipower of antiperiod . This is because there are at least two ’s in every factor of of length , and every factor of containing at least two ’s has, by construction, only one occurrence in .
Hence we have:
Thus we have , as claimed. ∎
4 Computing AntiPowers of Order
This section is devoted to establishing the following theorem and we assume is over an alphabet .
Theorem 2.
Given a string and an integer , the locations of all substrings of that are antipowers can be determined in ) time and space.
In light of the lower bound established in the previous section on the number of antipowers of a given order that can occur in a string, this solution to Problem 1 is optimal.
4.1 Computing antipowers having antiperiod
We begin with a lemma that we will use in our algorithm.
Lemma 3.
Given a string , the longest substring of that consists of pairwisedistinct symbols can be computed in time and space.
Proof.
We scan left to right, and maintain two pointers into it. Through the scan, both and are monotonically nondecreasing. We maintain the invariant that the symbols in the substring delineated by and , i.e., , are all distinct. In order to maintain this invariant, we keep an array , initially all 0s, such that immediately before we increment , is the rightmost position of symbol in (or 0 if does not appear in ). Clearly, for the invariant to hold, we must have that , otherwise there are (at least) two occurrences of in . In other words, if contains distinct letters then so will , provided . Initially, and the invariant holds. We increment until , at which point we know that the symbols of were distinct. If is the length of the longest such substring we have seen so far, we record and . We then restore the invariant by setting , which has the effect of dropping the left occurrence of the repeated symbol , so that again contains distinct symbols. The runtime is clearly linear in . The only nonconstant space usage is for . ∎
Obviously, the above algorithm can be used to efficiently compute antipowers having antiperiod 1. We will use it as a building block for finding antipowers of all antiperiods.
4.2 Optimal algorithm for computing antipowers
Let us now describe our algorithm. Firstly, observe that the maximum antiperiod of a antipower within is . Our algorithm works in rounds, . In a generic round we will determine if contains (as a substring) a antipower of antiperiod . Let be an integer name for substring amongst all substrings of length in — two substrings and have the same name if and only if the substrings are equal. Note that the number of names for any substring length is bounded above by , the length of the string. We can determine a suitable for all and in linear time from the names of substrings of length as follows. We create an array of pairs, , one for each position in the string. Initially, for all pairs. In round , we are computing the names of the substrings of length . We stably radix sort the pairs in time using as the sort key for pair . We then scan the sorted list of pairs, and for every run of adjacent pairs for which both and are equal, we assign them the same new name , overwriting their fields. After this scan, clearly only substrings and of length that are equal will have the same name because they had the same name and their last letters ( and ) are equal. We can now assign by scanning the list of pairs again and for each pair encountered setting .
To find a antipower of antiperiod , we must find a set of distinct substrings of length , whose starting positions are spaced exactly positions apart and so are all equal modulo .
Let be the set of positions in that are equal to modulo , i.e., .
Let be the string of length formed by concatenating the values (in increasing order of ) for which . We can form in time by visiting each and computing in constant time. As observed above, any substring of length in that contains alldistinct letters corresponds to a antipower. In particular, if is made up of distinct letters, then is a antipower.
Thus, in round of our algorithm we compute for each . The total space and time required is . We then scan each of these strings in turn and detect substrings of length containing distinct letters, using the algorithm in the proof of Lemma 3. This process is denoted by function Distinct, in Line 4.2 of our Algorithm. Function Distinct outputs a set of starting and ending positions of antipowers whose antiperiods are and starting positions . The time required to scan each string is and so is in total for round . The extra space needed for each scan is for the array of previous positions.
Because each round takes time, and there are rounds, the total running time to output all antipowers of order is . Since we can reuse space between rounds, the total space usage is .
AntiPowersS,k p1n/k i1p S’S APDistinctS’,k AP
1  2  2  3  3  3  
1  1  2  1  2  3  
aabababbbabb  133434  22242  1263  245  434  
(1,9),(4,12)  (2,10)  (3,11) 
5 Conclusions and Open Problems
The algorithm of the previous section is optimal in the sense that there are strings for which we must spend to simply list the antipowers of order because there are that many of them (as established in Section 3). One wonders though if an output senstive algorithm is possible, one that takes, say, time, where is the number of antipowers of order actually present in the input. Alternatively, do conditional lower bounds on antipower computation exist?
Many interesting algorithmic problems concerning antipowers remain. For example, suppose we are to preprocess and build a data structure so that later, given queries of the form , we have to determine quickly whether the substring is an antipower of order . Using suffix trees w1973 and weighted ancestor queries GawrychowskiLN14 it is fairly straightforward to achieve query time, in space. Alternatively, by storing metastrings for all possible antiperiods, it is not difficult to arrive at a data structure that requires space and answers queries in time. Is it possible to achieve a spacetime tradeoff between the extremes defined by these two solutions, or even better, to simultaneously achieve the minima of the space and query bounds?
Acknowledgements
Our sincere thanks goes to the anonymous reviewers, whose comments materially improved our initial manuscript. Golnaz Badkobeh is partially supported by the Leverhulme Trust on the Leverhulme Early Career Scheme. Simon J. Puglisi is supported by the Academy of Finland via grant 294143.
References
 [1] Golnaz Badkobeh, Maxime Crochemore, Costas S. Iliopoulos, and Marcin Kubica. Text redundancies. In Valerie Berthé and Michel Rigo, editors, Combinatorics, Words and Symbolic Dynamics, pages 151–174. Cambridge University Press, 2015.
 [2] Maxime Crochemore, Lucian Ilie, and Wojciech Rytter. Repetitions in strings: Algorithms and combinatorics. Theoretical Computer Science, 410(50):5227 – 5235, 2009.
 [3] Colin Defant. AntiPower Prefixes of the ThueMorse Word. Electronic Jouurnal of Combinatorics, 24(1):#P1.32, 2017.
 [4] Gabriele Fici, Antonio Restivo, Manuel Silva, and Luca Q. Zamboni. Antipowers in infinite words. In 43rd International Colloquium on Automata, Languages, and Programming, (ICALP), volume 55 of LIPIcs, pages 124:1–124:9. Schloss Dagstuhl  LeibnizZentrum fuer Informatik, 2016.
 [5] Gabriele Fici, Antonio Restivo, Manuel Silva, and Luca Q. Zamboni. Antipowers in infinite words. J. Comb. Theory, Ser. A, 157:109–119, 2018.
 [6] Pawel Gawrychowski, Moshe Lewenstein, and Patrick K. Nicholson. Weighted ancestors in suffix trees. In Proc. 22nd Annual European Symposium on Algorithms (ESA), volume 8737 of Lecture Notes in Computer Science, pages 455–466. Springer, 2014.
 [7] Shyam Narayanan. Functions on antipower prefix lengths of the ThueMorse word. https://arxiv.org/abs/1705.06310.

[8]
P. Weiner.
Linear pattern matching.
In IEEE 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.
Comments
There are no comments yet.