Counting, ordering, and generating basic discrete structures such as strings, permutations, set-partitions, etc. are fundamental tasks in computer science. A variety of such algorithms are assembled in the fourth volume of the prominent series “The art of computer programming” by D. Knuth [Knuth4a]. Nevertheless, this research direction remains very active [Mutze20].
If the structures under consideration are linearly ordered, e.g. a set of words under the dictionary (lexicographic) order, then a unique integer can be assigned to every structure. The rank (or index) of a structure is the number of structures that are smaller than it. The ranking problem asks to compute the rank of a given structure, while the unranking problem corresponds to its reverse: compute the structure of a given rank. Ranking has been studied for various objects including partitions [RankPartition], permutations [RankPerm1, RankPerm2], combinations [RankComb], etc. Unranking has similarly been studied for objects such as permutations [RankPerm2] and trees [Gupta1983, Pallo1986].
Both ranking and unranking are straightforward for the set of all words over a finite alphabet (assuming the standard lexicographic order), but they immediately cease to be so, as soon as additional symmetry is introduced. One of such examples is a class of necklaces [Graham1994]. A necklace, also known as a cyclic word, is an equivalence class of all words under the cyclic rotation operation, also known as a cyclic shift. Necklaces are classical combinatorial objects and they remain an object of study in other contexts such as total search problems [SplitNeck19] or circular splicing systems [CircularSplicing].
The rank of a word for a given set and its ordering is the number of words in that are smaller than . Often the set is a class of words, for instance all words of a given length over some alphabet. The first class of cyclic words to be ranked were Lyndon words - fixed length aperiodic cyclic words - by Kociumaka et. al. [Kociumaka2014] who provided an time algorithm. An algorithm for ranking necklaces - fixed length cyclic words - was given by Kopparty et. al. [Kopparty2016], without tight bounds on the complexity. A quadratic algorithm for ranking necklaces was provided by Sawada et al. [Sawada2017].
This paper answers the open problem of ranking bracelets, posed by Sawada and Williams [Sawada2017]. Bracelets are necklaces that are minimal under both cyclic shifts and reflections. Figure 1 provides an example of the ranks of length 8 bracelets over a binary alphabet. Bracelets have been studied extensively, with results for counting and generation in both the normal and fixed content cases [Karim2013, Sawada2001].
This paper presents the first algorithm for ranking bracelets of length over an alphabet of size in polynomial time, with a time complexity of . This algorithm is further used to unrank bracelets in . time. These polynomial time algorithms improve upon the exponential time brute-force algorithm.
We briefly mention our additional interest to this problem. Combinatorial necklaces and bracelets provide discrete representation of periodic motives in crystals. The problems on finding diverse and representative samples of languages of necklaces and bracelets can speed up space exploration in crystal structures [Collins17]. The essential component for building representative sample require efficient procedures for ranking bracelets.
2.1 Definitions and Notation
Let be a finite alphabet. We denote by the set of all words over and by the set of all words of length . For the remainder of this paper, let . The notation is used to clearly denote that the variable is a word. The length of a word is denoted . We use , for any to denote the symbol of . The reversal operation on a word , denoted by , returns the word .
In the present paper we assume that is linearly ordered. Let return the ordered set of integers from to inclusive. Given 2 words where , if and only if and for every . A word is lexicographically smaller than if there exists an such that and . For example, given the alphabet where , the word is smaller than as the first 2 symbols are the same and is smaller than . For a given set of words , the rank of with respect to is the number of words in that are smaller than .
The rotation of a word by returns the word , and is denoted by , i.e. . Under the rotation operation, is equivalent to if for some . The power of a word , denoted , is equal to repeated times. For example . A word is periodic if there is some word and integer such that . Equivalently, word is periodic if there exists some rotation where . A word is aperiodic if it is not periodic. The period of a word is the length of the smallest word for which there exists some value for which .
A cyclic word, also called a necklace, is the equivalence class of words under the rotation operation. For notation, a word is written as when treated as a necklace. Given a necklace , the necklace representative is the lexicographically smallest element of the set of words in the equivalence class . The necklace representative of is denoted , and the shift of the necklace representative is denoted . The reversal operation on a necklace returns the necklace containing the reversal of every word , i.e. . Given a word , will denote the necklace representative of the necklace containing , i.e. the representative of where .
A subword of the cyclic word , denoted is the word of length such that . For notation denotes that is a subword of . Further, denotes that is a subword of of length . If , then is a prefix and is a suffix. A prefix or suffix of a word is proper if its length is smaller than . For notation, the tuple is defined as the set of all subwords of of length . Formally let . Further, is assumed to be in lexicographic order, i.e. .
A bracelet is the equivalence class of words under the combination of the rotation and the reversal operations. In this way a bracelet can be thought of as the union of two necklace classes and , hence . Given a bracelet , the bracelet representative of , denoted by , is the lexicographically smallest word .
A necklace is palindromic if . This means that the reflection of every word in is in , i.e. given . Note that for any word , where is a palindromic necklace, either , or there exists some rotation for which .
Let and be a pair of necklaces belonging to the same bracelet class. For simplicity assume that . The bracelet encloses a word if . An example of this is the bracelet which encloses the word as . The set of all bracelets which enclose are referred to as the set of bracelets enclosing .
2.2 Bounding Subwords
For both the palindromic and enclosing cases the number of necklaces smaller than is computed by iteratively counting the number of words of length for which no subword is smaller than . The set of such words, denoted by , will be analysed iteratively as well, since it can have an exponential size. In order to relate to , we will split into parts using the positions of length subwords of with respect to the lexicographic order on . Informally, every can be associated with the unique lower bound from , which will be used to identify the parts leading us to the following definition.
Let where . The word is bounded (resp. strictly bounded) by , if (resp. ) and there is no such that .
The aforementioned parts contain all words such that . The key observation is that words of the form for all and some fixed symbol belong to the same set , where . The same holds true for words of the form . Thus, we can compute the corresponding for all pairs of and in order to derive sizes of . Moreover, this relation between , and is independent of allowing us to store this information in two arrays and . Both arrays will be indexed by the words and characters . Given a word strictly bounded by , will contain the word strictly bounding . Similarly, will contain the word strictly bounding . By precomputing these arrays, the cost of determining these words can be avoided during the ranking process. In order to compute these arrays, the following technical Lemmas are needed.
Let , , let and let be the subword of that bounds . The word bounds if and only if bounds .
Let bound . Since , we have . For the sake of contradiction assume that is bounded by . If then as for any smaller value of , would not bound . Under this assumption , in which case would bound , contradicting this assumption. If , then again , in which case bounds contradicting the original assumption that bounds .
In the other direction, let bound . If does not bound then there must exist some word bounding . As , hence . Therefore bounds , contradicting our original assumption. Hence bounds if and only if bounds where bounds . ∎
Let , let and let be the subword of that bounds . Let bound . Either bounds , or for .
Let bound . If then as , if then must bound , contradicting the assumption that bounds . Therefore the only possible value of is when for some . ∎
Let , let and let be the subword of that strictly bounds both and . The word which bounds will also bound .
For the sake of contradiction assume is bounded by . This implies that . Following Lemma 2, for . However, as , must be less than and hence would be a better bound for . ∎
Let . The array such that strictly bounds for every strictly bounded by can be computed in time .
Given some pair of arguments , the word bounding can be found through a binary search on . As each comparison will take at most operations, and at most comparisons are needed, each entry can be computed in operations. As there are subwords of and characters in , there is at most operations needed. ∎
Let . The array , such that strictly bounds for every strictly bounded by , can be computed in time.
For some pair pair of arguments , let be the smallest word greater than . The word bounding can be found through a binary search of . Following Lemma 3, given any word strictly bounded by , will also be bounded by the same word bounding . As in Proposition 1, each comparison will take at most operations, with the search requiring at most comparisons. As there are arguments, at most operations are needed to compute every value of . ∎
3 Ranking Bracelets
The main result of the paper is the first algorithm for ranking bracelets. In this paper, we tacitly assume that we are ranking a word of length . The time-complexity of the ranking algorithm is , where is the size of the alphabet and is the length of the considered bracelets. The key part of the algorithm is to compute the rank of the word with respect to the set of bracelets by finding three other ranks: the rank over all necklaces, the rank over palindromic necklaces, and the rank over enclosing apalindromic necklaces.
A bracelet can correspond to two apalindromic necklaces, or to exactly one palindromic necklace. If a bracelet corresponds to two necklaces and , then it is important to take into account the lexicographical positions of these two necklaces and with respect to a given word . There are three possibilities: and could be less than ; and encloses , e.g. , or both of necklaces and are greater than . This is visualised in Figure 2. Therefore the number of bracelets smaller than a given word can be calculated by adding the number of palindromic necklaces less than , enclosing bracelets smaller than and half of all other apalindromic and non-enclosing necklaces smaller than . Let us define the following notation is used for the rank of for sets of bracelets and necklaces.
denotes the rank of with respect to the set of necklaces of length over .
denotes the rank of with respect to the set of palindromic necklaces over .
denotes the rank of with respect to the set of bracelets of length over .
denotes the rank of with respect to the set of bracelets enclosing .
In Lemma 4 below, we show that can be expressed via , and . The problem of computing has been solved in quadratic time [Sawada2017], so the goal of the paper is to design efficient procedures for computing and .
The rank of a word with respect to the set of bracelets of length over the alphabet is given by .
Simply dividing the number of necklaces by 2 will undercount the number of bracelets, while doing nothing will overcount. Therefore to get the correct number of bracelets, those bracelets corresponding to only 1 necklace must be accounted for. A bracelet will correspond to 2 necklaces smaller than if and only if does not enclose and is apalindromic. Therefore the number of bracelets corresponding to 2 necklaces is . The number of bracelets enclosing is equal to . The number of bracelets corresponding to palindromic necklaces is equal to . Therefore the total number of bracelets is . ∎
Lemma 4 provides the basis for ranking bracelets. Theorem 1 uses Lemma 4 to get the complexity of the ranking process. The remainder of this paper will prove Theorem 1, starting with the complexity of ranking among palindromic necklaces in Section 4 followed by the complexity of ranking enclosing bracelets in Section 5.
Given a word , the rank of with respect to the set of bracelets of length over the alphabet , , can be computed in time.
The remainder of this paper will prove Theorem 1. For simplicity, the word is assumed to be a necklace representation. It is well established how to find the lexicographically largest necklace smaller than or equal to some given word. Such a word can be found in quadratic time using an algorithm form [Sawada2017]. Note that the number of necklaces less than or equal to corresponds to the number of necklaces less than or equal to the lexicographically largest necklace smaller than . From Lemma 4 it follows that to rank with respect to the set of bracelets, it is sufficient to rank with respect to the set of necklaces, palindromic necklaces, and enclosing bracelets. The rank with respect to the set of palindromic necklaces, can be computed in using the techniques given in Theorem 3 in Section 4. The rank with respect to the set of enclosing bracelets, can be computed in as shown in Theorem 4 in Section 5. As each of these steps can be done independently of each other, the total complexity is .
This complexity bound is a significant improvement over the naive method of enumerating all bracelets, requiring exponential time in the worst case. New intuition is provided to rank the palindromic and enclosing cases. The main source of complexity for the problem of ranking comes from having to consider the lexicographic order of the word under reflection. New combinatorial results and algorithms are needed to count the bracelets in these cases.
Before showing in detail the algorithmic results that allow bracelets to be efficiently ranked, it is useful to discus the high level ideas. Lemma 4 shows our approach to ranking bracelets by dividing the problem into the problems of ranking necklaces, palindromic necklaces and enclosing bracelets. For both palindromic necklaces and enclosing bracelets, we derive a canonical form using the combinatorial properties of these objects.
Using these canonical forms, the number of necklaces smaller than is counted in an iterative manner. In the palindromic case, this is done by counting the number of necklaces greater than , and subtracting this from the total number of palindromic necklaces. In the enclosing case, this is done by directly counting the number of necklaces smaller than . For both cases, the counting is done by way of a tree comprised of the set of all prefixes of words of the canonical form. By partitioning the internal vertices of the trees based on the number of children of the vertices, the number of words of the canonical form may be derived in an efficient manner, forgoing the need to explicitly generate the tree. This allows the size of these partitions to be computed through a dynamic programming approach. It follows from these partitions how to count the number of leaf nodes, corresponding to the canonical form.
The bracelet of length over can be computed in .
The unranking process is done through a binary search using the ranking algorithm as a black box. Let be a word which is the bracelet representation of the bracelet. The value of is determined iteratively, starting with the first symbol and working forwards. The first symbol of is determined preforming a binary search over . For , the words and are generated, where is the smallest symbol in and the largest. If , then the first symbol of is , otherwise the new value of is chosen by standard binary search, being greater than if and less than if . The symbol of is done in a similar manner, generating the words and , converting to a necklace representation using Algorithm 1 due to Sawada and Williams [Sawada2017]. Repeating this for all symbols leaves as being the bracelet representation of the smallest bracelet, i.e. the bracelet with smaller bracelets. As the binary search will take operations for each of the symbols, requiring time to rank for each symbol at each position. Therefore the total complexity is time. ∎
4 Computing the rank
To rank palindromic necklaces, it is crucial to analyse their combinatorial properties. This section focuses on providing results on determining unique words representing palindromic necklaces. We study two cases depending on whether the length
of a palindromic necklace is even or odd. The reason for this division can be seen by considering examples of palindromic necklaces. If equivalence under the rotation operation is not taken into account, then a word is palindromic if. If the length of is odd, then if , can be written as , where and . For example, the word is equal to , where and . If the length of is even, then if , can be written as , where . For example the word is equal to , where .
Once rotations are taken into account, the characterisation of palindromic necklaces becomes more difficult. It is clear that any necklace that contains a word of the form or is palindromic. However this check does not capture every palindromic necklace. Let us take, for example, the necklace , which contains two words and . While can neither be written as nor , it is still palindromic as . Therefore a more extensive test is required. As the structure of palindromic words without rotation is different depending on the length being either odd or even, it is reasonable to split the problem of determining the structure of palindromic necklaces into the cases of odd and even length.
The number of palindromic necklaces are counted by computing the number of these characterisations. This is done by constructing trees containing every prefix of these characterisations. As each vertex corresponds to the prefix of a word, the leaf nodes of these trees correspond to the words in the characterisations. By partitioning the tree in an intelligent manner, the number of leaf nodes and therefore number of these characterisations can be computed. In the odd case this corresponds directly to the number of palindromic necklaces, while in the even case a small transformation of these sets is needed.
4.1 Odd Length Palindromic Necklaces
Starting with the odd-length case, Proposition 3 shows that every palindromic necklace of odd length contains exactly one word that can be written as where and . This fact is used to rank the number of bracelets by constructing a tree representing every prefix of a word of the form that belongs to a bracelet greater than .
A necklace of odd length is palindromic if and only if there exists exactly one word such that , where and .
Let . If is of the form , then clearly we have that . In the other direction, for the sake of contradiction assume is a palindromic necklace of odd length such that no word is of the form . Note that the cardinality of is equal the period of the words in . As the length of the words in is odd, so to must be the length of the period. Given a word , if then the size of is equal to . As the size of is odd, there must be at least one word where . For , . Therefore this word can be expressed as where and .
For the remainder of this proof is used to denote the character at position in the word . For the sake of contradiction, assume that there exists some pair of words such that and both and . As both and belong to the same necklace class, there must exist some rotation such that . Further, as , . Therefore, , , and . Further and . Therefore . Therefore implying that . Therefore the period of must be equal to some common divisor of and . As the length of is odd, the greatest divisor equals to . As such the period must be a factor of , meaning that , contradicting the assumption that . Therefore there is exactly one word in of the form . ∎
The number of palindromic necklaces of odd length over equals .
It follows from Proposition 3 that for every palindromic necklace of length , there exists exactly one word and symbol such that . Hence, the number of palindromic necklaces equals the number of words of the form with length . Note that for the length of to be , the length of must be . Therefore the number of values of is . As there are values of , the number of values of is . ∎
The problem now becomes to rank a word with respect to the odd length palindromic necklaces utilising their combinatorial properties. Let be a word of odd length . We define the set , where stands for palindromic odd length. The set contains one word representing each palindromic bracelet of odd length that is greater than .
As each word will correspond to a unique palindromic necklace of length greater than , and every palindromic necklace greater than will correspond to a word in , the number of palindromic necklaces greater than is equal to . Using this set the number of necklaces less than can be counted by subtracting the size of from the total number of odd length palindromic necklaces, equal to (Corollary 1).
High level idea for the Odd Case. Here we provide a high level idea for the approach we follow for computing . Let have a length . Since only contains words of the form , where and , we have that for every .As the lexicographically smallest rotation of every must be greater than , it follows that any word rotation of must be greater than and therefore every subword of must also be greater than or equal to the prefix of of the same length. This property is used to compute the size of by iteratively considering the set of prefixes of each word in in increasing length representing them with the tree . As generating directly would require an exponential number of operations, a more sophisticated approach is needed for the calculation of based on partial information.
As the tree is a tree of prefixes, vertices in are referred to by the prefix they represent. So refers to the unique vertex in representing . The root vertex of corresponds to the empty word. Every other vertex corresponds to a word of length , where is the distance between and the root vertex. Given two vertices , is the parent vertex of a child vertex if and only if for some symbol . The layer of refers to all representing words of length in . The size of is equivalent to the number of unique prefixes of length of words of the palindromic form in . This set of prefixes corresponds to the vertices in the layer of . Therefore the maximum depth of is .
To speed up computation, each layer of is partitioned into sets that allow the size of to be efficiently computed. This partition is chosen such that the size of the sets in layer can be easily derived from the size of the sets in layer . As these sets are tied to the tree structure, the obvious property to use is the number of children each vertex has. As each vertex represents a prefix of some word , the number of children of is the number of symbols such that is a prefix of some word in . Recall that every word in has the form , and that there is no subword of that is less than . Therefore if , there must be no subword of that is less than . Hence the number of children of is the number of symbols such that no subword of is less than the prefix of of the same length. As has no subword less than , will only have a subword that is less than if either (1) or (2) there exists some suffix of length such that and . For the first condition, let . By the definition of strictly bounding subwords (Definition 1), if and only if . Note that this ignores any word