Palindromic Ziv-Lempel and Crochemore Factorizations of m-Bonacci Infinite Words

05/03/2019 ∙ by Marieh Jahannia, et al. ∙ University of Tehran University of Liège The University of Winnipeg 0

We introduce a variation of the Ziv-Lempel and Crochemore factorizations of words by requiring each factor to be a palindrome. We compute these factorizations for the Fibonacci word, and more generally, for all m-bonacci words.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Ziv–Lempel [9] and Crochemore [4] factorizations are two well-known factorizations of words used in text compression and other text algorithms. Here we apply them to infinite words. Let denote the length of a finite word . In this paper, we start indexing words at , i.e., if is a finite word over the alphabet , then we write where for all . If is an infinite word and is a finite word, we say there is an occurrence of at position in if for some word of length and some infinite word . Given an infinite word , the Ziv–Lempel or -factorization of is the factorization

where is the shortest prefix of such that there is no occurrence of in at any position . The Crochemore or -factorization of is the factorization

where is the longest prefix of such that there is an occurrence of in at some position , or, if this prefix does not exist, the factor is just a single letter.

For instance, if is the Fibonacci word, we have

and

Note that if is ultimately periodic the -factorization is not well-defined, since eventually there will be no factors that do not occur previously in . Similarly, if is ultimately periodic the definition of the -factorization will result in some factor being an infinite word. We are not interested in ultimately periodic words in this paper and will therefore ignore this possibility and assume that any infinite word considered in this paper is aperiodic.

In the context of combinatorics on words, these factorizations have been computed for certain important families of words. Berstel and Savelli [2] computed the -factorizations of all standard Sturmian words. They also observed that the -factorization of the Fibonacci word coincides with the singular factorization of the Fibonacci word introduced by Wen and Wen [13]. Fici [5] has given an excellent survey of these and other factorizations of the Fibonacci word. Ghareghani, Mohammad–noori, and Sharifani [6] determined the - and -factorizations of standard episturmian words. Constantinescu and Ilie [3] used the -factorization to define the Lempel–Ziv complexity of an infinite word.

We introduce the palindromic -factorization and palindromic -factorization by requiring that each of the factors in the previous definitions be palindromes. That is, the palindromic -factorization of is the factorization

where is the shortest palindromic prefix of such that there is no occurrence of in at any position . The palindromic -factorization may not exist for certain infinite words . For instance, if only contains palindromes of bounded length, then the palindromic -factorization will not exist. This type of factorization is therefore only interesting when applied to infinite words with arbitrarily long palindromic factors. The palindromic -factorization of is the factorization

where is the longest palindromic prefix of such that there is an occurrence of in at some position , or, if this prefix does not exist, the factor is just a single letter.

For instance, if is the Fibonacci word, we have

and

It turns out that and are the same, and in fact are equal to the singular factorization of (which we define later). However, the factorizations and are not the same. We show that the factors of can also be written in terms of the singular words and the factorization (except for the first few factors) coincides with a nice factorization of that appears in [5].

We believe that it could be of interest to compare the ordinary - and -factorizations of certain infinite words with their palindromic - and -factorizations, in the same way that one can compare the ordinary complexity function of an infinite word with its palindromic complexity function (see [1]).

The main results of this paper give a description of the palindromic - and -factorizations of the Fibonacci word and, more generally, the -bonacci word for .

2 Basics from combinatorics on words

Let be a finite alphabet, i.e., a finite set made of letters. A (finite) word over is a finite sequence of letters belonging to . If with and for all , then the length of is , i.e., it is the number of letters that contains. We let denote the empty word. This special word is the neutral element for concatenation of words, and its length is set to be . The set of all finite words over is denoted by , and we let denote the set of non-empty finite words over . An infinite word over is any infinite sequence over . The set of all infinite words over is denoted by . Note that in this paper infinite words are written in bold.

A finite word is a prefix (resp., suffix) of another finite word if there exists such that (resp., ). The word is said to be a factor of if there exist such that . If is a finite word over , we write and . Observe that if with , then and . In particular, for any words , we have .

In the same way, a finite word is a prefix of an infinite word if there exist such that . The word is said to be a factor of if there exist and such that .

Let with and for all . The mirror image, or reversal, of is the word over , i.e., the word obtained by reading from right to left. We say that a word over is a palindrome if .

A factorization of a finite word is a finite sequence of finite words over such that

Similarly, a factorization of an infinite word is a sequence of finite words over such that

A morphism on is a map such that for all , we have . In order to define a morphism, it suffices to provide the image of letters belonging to . A morphism is said to be prolongable on a letter if with and is non-erasing, i.e., the image of no letter is the empty word. If is prolongable on , then is a proper prefix of for all . Therefore, the sequence of finite words defines an infinite word that is a fixed point of .

In combinatorics on words, given an alphabet , a set of non-empty words is a code on if any word has at most one factorization using words of . For more on this topic, see, for instance, [10, Chapter 6]. The following result can be found in [10, Chapter 6].

Proposition 1.

Let be two finite alphabets, and let be an injective morphism. If is a code on , then is a code on .

In the following definition, we introduce two new factorizations of interest.

Definition 2.

Let be an infinite word over . The palindromic Ziv–Lempel or palindromic -factorization of is the factorization

where is the shortest palindromic prefix of such that there is no occurrence of in at any position . The palindromic Crochemore or palindromic -factorization of is the factorization

where is the longest palindromic prefix of such that there is an occurrence of in at some position , or, if this prefix does not exist, the factor is just a single letter.

3 The Fibonacci case

3.1 Some known results and preliminaries

Before establishing the two palindromic factorizations of the Fibonacci word, we gather some definitions and necessary results. Some of them are well known and can be found in [5, 13]. In the following definition, we follow the lines of [5].

Definition 3.

Let be the (infinite) Fibonacci word, i.e., the fixed point of the morphism , starting with . For all , define the finite word to be the th iteration of on . The first few words of the sequence are . It is well known that the Fibonacci word is the limit of . Let be the sequence of the palindromic prefixes of , which are also called central words. The first few terms of this sequence are . The singular words satisfy , and, for all , and . The first few singular words are .

The following properties of the singular words can be found in [13].

Proposition 4.

Let be the sequence of Fibonacci numbers with initial conditions and .

  • For all , is a palindrome.

  • For all , .

  • For all , .

  • For all , is not a factor of .

  • For all , is not a factor of .

  • Let and let where and . If with , then .

  • Let and define to be if

    is odd, or

    if is even. Then .

The following result can be found in [5]. Note that the first factorization of the Fibonacci word also appears in [13].

Proposition 5.

We have the following two factorizations of the Fibonacci word

Moreover, the Ziv–Lempel factorization of the Fibonacci word is given by the sequence of singular words, i.e.,

As a matter of fact, the palindromic -factorization of is easily deduced from the previous result, as shown in the next section. However, the palindromic -factorization of cannot be obtained from already known results, and, to that aim, we define a sequence of specific prefixes of .

Definition 6.

For all , define

From (5), observe that, for all , we have

Interestingly, the prefix of can be factorized as a particular product of singular words.

Proposition 7.

For all , we have

(3)
Proof.

Proceed by induction on . The result holds for because . For , we get by Definition 6 and therefore

as desired. Assume that . Now we suppose the result holds up to and we show it still holds for . Using Definition 6, we have

By the induction hypothesis, we get

Since , Proposition 4 implies that , and we deduce that

which ends the proof. ∎

3.2 The palindromic -factorization of the Fibonacci word

In this (very) short section, we obtain the palindromic -factorization of the Fibonacci word, which easily follows from already known results.

Theorem 8.

The palindromic -factorization of the Fibonacci word is

Proof.

From Proposition 5, . Since the factors are all palindromes by Proposition 4, this factorization is also . ∎

3.3 The palindromic -factorization of the Fibonacci word

In this section, we show that, after the prefix of length , the factorization (5) coincides with the factorization . Note that in this case and are not the same, since the factors in are not palindromes.

Lemma 9.

For all , the only suffix of that is also a prefix of is the empty word.

Proof.

We proceed by induction on . From Definition 3, the first two singular words are and , so the result can be checked by hand for .

Now suppose that , and that the only suffix of that is also a prefix of is the empty word, for all . We show that the result still holds for . Proceed by contradiction and suppose there exists a word which is a non-empty suffix of and a non-empty prefix of . We have . Using Proposition 4(6), starts and ends with .

If , then is a prefix of (recall that is a prefix of ). Consequently, is a non-empty suffix of and a non-empty prefix of . This contradicts the inductive assumption.

If , then is a prefix of (recall that is a prefix of ). In particular, is a factor of , and also a factor of (recall that is a suffix of ). This contradicts Proposition 4(4). ∎

In the following lemma, recall that we start indexing words at .

Lemma 10.

Let . There are exactly two occurrences of the factor inside the word : one at position , the other at position .

Proof.

If , then and the factor occurs in at positions and . If , then with for all . There are exactly two occurrences of in starting either at position or .

Suppose that . Using (3), let us write with . Thanks to this factorization, we immediately see that occurs at least twice as a factor of : one starting at position , the other beginning at position . We now show that there are no other occurrences of as a factor of . There are several cases to consider.

Case 1. The word cannot be a factor of , otherwise it contradicts Proposition 4(5).

Case 2. The word cannot be a factor of , otherwise it contradicts Proposition 4(4).

Case 3. The word cannot be a factor of since by Proposition 4(2) (note that ).

Case 4. Suppose that is a factor of , overlapping and . Using Proposition 4(2) (), we know that

Consequently, is a factor of . If starts somewhere within , or if starts with the first letter of , then is a factor of , which contradicts Proposition 4(4). Therefore must be a factor of , i.e., there exist a non-empty suffix of and a non-empty prefix of such that . Then is also a non-empty prefix of , which contradicts Lemma 9.

Case 5. Suppose that is a factor of , overlapping and . This case is similar to the fourth case above. Indeed, observe that, since , Proposition 4(3) gives

Using Proposition 4 again, we know that . Consequently, is a factor of , so is a factor of , which is impossible due to the fourth case.

Case 6. Suppose that is a factor of , overlapping and . In this case, is a factor of since the singular words are palindromes. As in the fifth case, we raise a contradiction.

Case 7. Suppose that is a factor of , overlapping and . In this case, is a factor of since the singular words are palindromes. As in the fourth case, we reach a contradiction. ∎

We prove a technical result before getting the palindromic -factorization of .

Proposition 11.

Let . Let be a non-empty common finite prefix of the infinite words

and

Then is not a palindrome.

Proof.

Let us define

where is taken as in the statement. Using Proposition 4, since and , we know that . Now proceed by contradiction and suppose that is a palindrome. Then we have

(4)

The bounds on the length of lead to an overlap between the occurrence of at position (in the leftmost word in (4)), and the occurrence at position (in the rightmost word in (4)). This is impossible due to either Proposition 4(4), or Lemma 9. ∎

Theorem 12.

Let denote the palindromic -factorization of the Fibonacci word . Then, we have , , and, for all ,

Proof.

By definition of the palindromic -factorization of the Fibonacci word , we clearly have , and . For the second part of the result, proceed by induction on . Suppose . Let us find the factor of the palindromic -factorization of . We have

and the longest palindrome starting with and occurring before is

as expected.

For the induction step, suppose and assume that, for all , we have . We show it is still true for . On the one hand, by the induction hypothesis, we have

(5)

and the goal is to find the next factor of the palindromic -factorization of , i.e., the word . On the other hand, using (5) first and then (3) since is large enough, we get

(6)

Using (6), it is clear that since is a palindrome occurring before. Therefore, there exists a word such that . We claim that is in fact the empty word and proceed by contradiction.

By Lemma 10, we know that there are exactly two occurrences of in : one starts at position , and the other at position .

Case 1. Let us deal with the occurrence of in at position . In this case, must be a common prefix of the infinite words

and

By Proposition 11, we know that is not a palindrome if is non-empty, a contradiction.

Case 2. Let us consider the occurrence of in at position . In this case, must be a common prefix of the infinite words

and

Using Proposition 4, we know that

Consequently, , which violates Proposition 4 (items (4) or (6)).

As a conclusion, the longest palindrome starting with the first letter of and occurring before is

as required. ∎

4 The -bonacci case

In this section, we extend the results obtained for the Fibonacci word to any -bonacci word, namely we get the palindromic - and -factorizations of any -bonacci word. The strategy is similar to the one adopted in the previous case: we define a particular sequence of finite words that we will call p-singular words, and we write the palindromic - and -factorizations of any -bonacci word in terms of this sequence. In the case , the words turn out to be the singular words (see Proposition 22).

4.1 Preliminaries

Definition 13.

Let . We define the morphism on by

When , then , and we fall into the Fibonacci case above.

Let be the (infinite) -bonacci word, i.e., the fixed point of the morphism , starting with . For all , define to be the th iteration of on . It is well known that the -bonacci word is the limit of . For the sake of simplicity, when the context is clear, we write instead of .

From now on, is a fixed integer greater than unless otherwise specified.

Example 14.

If , then is the Fibonacci word. See also Definition 3. If , then , and the infinite word is called the Tribonacci word. If , then , and the infinite word is called the Quadribonacci word. In Table 1, the first few words of the sequences