The undirected repetition threshold and undirected pattern avoidance

06/12/2020
by   James D. Currie, et al.
The University of Winnipeg
0

For a rational number r such that 1<r≤ 2, an undirected r-power is a word of the form xyx', where the word x is nonempty, the word x' is in {x,x^R}, and we have |xyx'|/|xy|=r. The undirected repetition threshold for k letters, denoted (k), is the infimum of the set of all r such that undirected r-powers are avoidable on k letters. We first demonstrate that (3)=74. Then we show that (k)≥k-1k-2 for all k≥ 4. We conjecture that (k)=k-1k-2 for all k≥ 4, and we confirm this conjecture for k∈{4,5,…,21}. We then consider related problems in pattern avoidance; in particular, we find the undirected avoidability index of every binary pattern. This is an extended version of a paper presented at WORDS 2019, and it contains new and improved results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/22/2019

The undirected repetition threshold

For rational 1<r≤ 2, an undirected r-power is a word of the form xyx', w...
09/05/2018

Repetition avoidance in products of factors

We consider a variation on a classical avoidance problem from combinator...
01/16/2018

Subword complexity and power avoidance

We begin a systematic study of the relations between subword complexity ...
05/21/2020

Undirected Unicast Network Capacity: A Partition Bound

In this paper, we present a new technique to obtain upper bounds on undi...
11/13/2019

The Number of Threshold Words on n Letters Grows Exponentially for Every n≥ 27

For every n≥ 27, we show that the number of n/(n-1)^+-free words (i.e., ...
05/21/2020

On the Partition Bound for Undirected Unicast Network Information Capacity

One of the important unsolved problems in information theory is the conj...
07/25/2019

Learning binary undirected graph in low dimensional regime

Given a random sample extracted from a Multivariate Bernoulli Variable (...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A square is a word of the form , where is a nonempty word. An Abelian square is a word of the form , where is an anagram (or permutation) of . The notions of square and Abelian square can be extended to fractional powers in a natural way. Let be a rational number. An (ordinary) -power is a word of the form , where is a nonempty word, and . An Abelian -power is a word of the form , where is a nonempty word, the word is an anagram of , and . Here, we use the definition of Abelian -power given by Cassaigne and Currie [7]. We note that several distinct definitions exist (see [33, 20], for example).

In general, if is an equivalence relation on words that respects length (i.e., we have whenever ), then an -power up to is a word of the form , where is a nonempty word, and we have both and . The notion of -power up to generalizes ordinary -powers and Abelian -powers, where the equivalence relations are equality and “is an anagram of”, respectively.

Let be an equivalence relation on words that respects length. For a real number , a word is called -free up to if no factor of is an -power up to for . The word is called -free up to if no factor of is an -power up to for . For every integer , we say that -powers up to are -avoidable if there is an infinite word on letters that is -free up to , and -unavoidable otherwise. For every integer , the repetition threshold up to for letters, denoted , is defined as

Since we have only defined -powers up to for , it follows that or for any particular value of .

It is well-known that squares are -avoidable [2]. Thus, for , we have that is the usual repetition threshold, denoted simply . Dejean [16] proved that , and conjectured that and for all . This conjecture has been confirmed through the work of many authors [15, 31, 30, 16, 4, 27, 14, 26].

Let denote the equivalence relation “is an anagram of”. It is well-known that Abelian squares are -avoidable [25]. Thus, for all , we see that is equal to the Abelian repetition threshold (or commutative repetition threshold) for letters, introduced by Cassaigne and Currie [7], and denoted . Relatively less is known about the Abelian repetition threshold. Cassaigne and Currie [7] give (weak) upper bounds on in demonstrating that . Samsonov and Shur [33] conjecture that and for all , and give a lower bound matching this conjecture. (Note that Samsonov and Shur define weak, semi-strong, and strong Abelian -powers for all real numbers . For any rational number , their definitions of semi-strong Abelian -power and strong Abelian -power are both equivalent to our definition of Abelian -power.)

For every word , where the are letters, we let denote the reversal of , defined by . For example, if then . Let be the equivalence relation on words defined by if . In this article, we focus on determining . We simplify our notation and terminology as follows. We refer to -powers up to as undirected -powers. These come in two types: words of the form are ordinary -powers, while we refer to words of the from as reverse -powers. For example, the English words edited and render are undirected -powers; edited is an ordinary -power, while render is a reverse -power. We say that a word is undirected -free if it is -free up to . The definition of an undirected -free word is analogous. We let , and refer to this as the undirected repetition threshold for letters.

It is clear that is coarser than and finer than . Thus, for every rational , an -power is an undirected -power, and an undirected -power is an Abelian -power. As a result, we immediately have

for all .

We now describe the layout of the remainder of the article. In Section 2, we show that using a standard morphic constuction. In Section 3, we demonstrate that for all . In Section 4, we use a variation of the encoding introduced by Pansiot [30] to prove that for . In Section 5, we consider some related problems in pattern avoidance. In particular, we find the “undirected avoidability index” of every binary pattern.

In light of our results on the undirected repetition threshold, we propose the following conjecture.

Conjecture 1.

For every , we have .

We note that words of the form are sometimes referred to as gapped repeats, and that words of the form are sometimes referred to as gapped palindromes. In particular, an ordinary (reverse, respectively) -power satisfying is called an -gapped repeat (-gapped palindrome, respectively). From this perspective, the undirected repetition threshold is a measure of how large we can make the “gaps” of the gapped repeats and gapped palindromes in an infinite word over an alphabet of size . Algorithmic questions concerning the identification and enumeration of -gapped repeats and palindromes in a given word, along with some related questions, have recently received considerable attention; see [24, 19, 23, 8] and the references therein. Gapped repeats and palindromes are important in the context of DNA and RNA structures, and this has been the primary motivation for their study.

We now introduce some notation and terminology that will be used in the sequel. For every integer , we let denote the alphabet . Let and be alphabets, and let be a morphism. Using the standard notation for images of sets, we have which we refer to as the set of blocks of . A set of words is called a prefix code if no element of is a prefix of another. If is a prefix code and is a nonempty factor of some element of , a cut of over is a pair such that (i) ; and (ii) for every pair of words with we have . (Note that it suffices to check condition (ii) for every pair of words where is a prefix of a block and is a suffix of a block.) We use vertical bars to denote cuts. For example, over the prefix code the word 11 has cut . The prefix code that we work over will always be the set of blocks of a given morphism, and should be clear from context if it is not explicitly stated.

2

Dejean [16] demonstrated that , and hence we must have . In order to show that it suffices to find an infinite ternary word that is undirected -free. We provide a morphic construction of such a word. Let be the -uniform morphism defined by

1
2
3

The morphism is similar in structure to the morphism of Dejean [16] whose fixed point avoids ordinary -powers (but not undirected -powers). Note, in particular, that is “symmetric” as defined by Frid [21].

The following theorem was also verified by one of the anonymous reviewers of the conference version of this paper using the automatic theorem proving software Walnut [28].

Theorem 2.

The word is undirected -free.

Proof.

We first show that has no factors of the form with (which is equivalent to ). By exhaustively checking all factors of length of , we find that has no reversible factors of length greater than . So if has a factor of the form with , then , and in turn . So . Every factor of length at most appears in , so by checking this prefix exhaustively we conclude that has no factors of this form.

It remains to show that is (ordinary) -free. Suppose towards a contradiction that has a factor with . Let be the smallest number such that a factor of this form appears in . By exhaustive check, we have . First of all, if , then . Every factor of of length at most appears in , so we may assume that . Then contains at least one of the factors 12131, 23212, or By inspection, each one of these factors determines a cut in over the blocks of , say , where is a possibly empty proper suffix of a block of , and is a possibly empty proper prefix of a block of . If is properly contained in a single block, then

In this case, one verifies that the preimage of contains a square, which contradicts the minimality of . Otherwise, if is not properly contained in a single block of , then , where is a possibly empty proper suffix of a block, and is a possibly empty proper prefix of a block. Then

which appears internally as

The preimage of this factor is where , , , and Then or equivalently which contradicts the minimality of . ∎

Thus, we conclude that . We will see in the next section that is strictly greater than for every .

3 A lower bound on for

Here, we prove that for .

Theorem 3.

If , then , and the longest word over that is undirected -free has length .

Proof.

For , the statement is checked by a standard backtracking algorithm, which we performed both by hand and by computer. We now provide a general backtracking argument for all .

Fix , and suppose that is a word of length that is undirected -free. It follows that at least letters must appear between any two repeated occurrences of the same letter in , so that any length factor of must contain distinct letters. So we may assume that has prefix . Further, given any prefix of of length at least , there are only two possibilities for the next letter in , as it must be distinct from the distinct letters preceding it. These possibilities are enumerated in the tree of Figure 1.

k

2

3

1

1

3

4

2

2

1

k

3

4

2

2

4

5

3

3

2
Figure 1: The tree of undirected -power free words on letters.

We now explain why each word corresponding to a leaf of the tree contains an undirected -power for some . We examine the leaves from top to bottom, and use the fact that when .

  • The factor is an ordinary -power.

  • The factor is an ordinary -power.

  • The factor is a reverse -power.

  • The factor is an ordinary -power.

  • The factor is an ordinary -power.

  • The factor is an ordinary -power.

  • The factor is an ordinary -power.

  • The factor is a reverse -power.

  • The factor is an ordinary -power.

  • The factor is a reverse -power.

  • The factor is an ordinary -power.∎

Conjecture 1 proposes that the value of matches the lower bound of Theorem 3 for all . In the next section, we confirm Conjecture 1 for some small values of .

4 for all

First we explain why we rely on a different type of construction than the one we used to prove that in Section 2. A morphism is called -free (-free, respectively) if it maps every -free (-free, respectively) word in to an -free (-free, respectively) word in . The morphism is called growing if for all . Brandenburg [3] demonstrated that for every , there is no growing -free morphism from to . By a minor modification of his proof, one can show that there is no growing -free morphism from to . While this does not entirely rule out the possibility that there is a morphism from to whose fixed point is -free, it suggests that a different type of construction may be required. Our constructions rely on a variation of the encoding introduced by Pansiot [30] in showing that . Pansiot’s encoding was later used in all subsequent work on Dejean’s Conjecture.

4.1 A ternary encoding

We first describe an alternate definition of ordinary -powers which will be useful in this section. A word , where the are letters, is periodic if for some positive integer , we have for all . In this case, the integer is called a period of . The exponent of , denoted is the ratio between its length and its minimal period. If , then is an -power.111If , then is an -power as we have defined it in Section 1. If , then we take this as the definition of an (ordinary) -power. For example, the English word alfalfa has minimal period and exponent so it is a -power. We can write any -power as , where and is a prefix of . In this case, we say that is the excess of the -power .

Suppose that is an undirected -free word that contains at least distinct letters. Write with . Certainly, every length factor of contains distinct letters, and it is easily checked that every length factor of contains at least distinct letters.

Now let be any word containing at least distinct letters and satisfying these two properties:

  • Every length factor of contains distinct letters; and

  • Every length factor of contains at least distinct letters.

Let be the shortest prefix of containing distinct letters. We see immediately that has length or . Write , where with . Define and for all . For all , the prefix determines a permutation

of the letters of , which ranks the letters of by the index of their final appearance in . In other words, the word is the length suffix of , and of the two letters in , the letter is the one that appears last in . Note that the final letter may not even appear in . For example, on the prefix 123416 gives rise to the permutation

Since every factor of length in contains distinct letters, for any , the letter must belong to the set This allows us to encode the word over a ternary alphabet, as described explicitly below.

For , define , where for all , we have

For example, on , for the word the shortest prefix containing distinct letters is , and has encoding Given the shortest prefix of containing distinct letters, and the encoding , we can recover . Moreover, if has period , then so does . The exponent of corresponds to an exponent of .

Let denote the symmetric group on with left multiplication. Define a morphism by

One proves by induction that . It follows that if has period , and contains at least distinct letters, then the length prefix of lies in the kernel of . In this case, the word is called a kernel repetition. For example, over , the word

has period , and excess 12324. Hence, the encoding is a kernel repetition; one verifies that .

The following straightforward lemma will be used to bound the length of reversible factors in the words that we construct.

Lemma 4.

Let , and let be a word with encoding . Suppose that neither 312 nor 322 is a factor of . Let be a factor of whose encoding contains the factor 1231. Then is not a factor of .

Proof.

Since contains the factor 1231, the word contains some permutation of the factor . By inspection, the reversal of this word, namely , has encoding 312 or 322, neither of which is a factor of the encoding by assumption. We conclude that is not a factor of . ∎

4.2 Constructions

For , define the morphism as follows:

For all , define by

For , define as follows:

Theorem 5.

Fix . Let be the word over with prefix and encoding . Then is undirected -free.

The remainder of this section is devoted to proving Theorem 5. Essentially, we adapt and extend the technique first used by Moulin-Ollagnier [27]. A simplified version of Moulin-Ollagnier’s technique, which we follow fairly closely, is exhibited by Currie and Rampersad [15].

For the remainder of this section, we use notation as in Theorem 5, but we omit the subscripts on , , and for convenience. We let and , i.e., we say that is -uniform, and is -uniform. We use the following properties of and several times:

  • Every factor of of length contains a cut over the blocks of , and every factor of of length contains a cut over the blocks of .

  • The blocks and end in different letters, and the blocks and end in different letters.

The first property was verified by computer.

Before proceeding with the proof of Theorem 5, we discuss the kernel repetitions that appear in . Let the factor of be a kernel repetition with period ; say . Let be the maximal period extension of the occurrence of . Write and so that . Write where . By the periodicity of , the factor is conjugate to , and hence is in the kernel of . Suppose that contains a cut over the blocks of . Then we may write uniquely in the form , where the word , the word is a proper suffix of or , and the word is a proper prefix of or . Similarly, we may write , where the word and the word is a proper suffix of or . Since is a prefix of , and since and end in different letters, it follows from the maximality of that . Finally, by the maximality of , we have that , the longest common prefix of and . So we have and . Since is a prefix of , we have that is a prefix of . We see that and .

Let be the composite morphism . Since is in the kernel of , we see that

i.e., the word is in the kernel of .

Now set and . By the maximality of the repetition must be a maximal repetition with period (i.e., it cannot be extended). If has a cut, then it follows by arguments similar to those used above that and , where is a prefix of and is the longest common prefix of and . One checks that there is an element such that

for every , i.e., the morphism satisfies the “algebraic property” described by Moulin-Ollagnier [27]. It follows that is in the kernel of . We can repeat this process until we reach a repetition whose excess has no cut. Recalling that is an -uniform morphism, we have

and

Proof of Theorem 5.

Let be the word with prefix and encoding . Note that is a prefix of , and hence is a prefix of . We begin by verifying computationally that is undirected -free. This fact will be used several times in the proof.

We first show that contains no reverse -power with . We verify computationally that there is a finite number such that every factor of of length contains the factor 1231. (While the exact value of depends on , we have for every .) Further, since 312 and 322 are not factors of , we conclude by Lemma 4 that no factor of of length is reversible. Thus, if is a factor of with , then . In turn, we have . We verify computationally that every factor of of length less than appears in , and hence some permutation of every factor of appears in . Since is undirected -free, we conclude that contains no reverse -power with .

It remains to show that is ordinary -free. Suppose to the contrary that is a factor of such that is a prefix of and . We may assume that is maximal with respect to having period . If has less than distinct letters, then . In turn, we have . since every factor of of length less than occurs in , and is undirected -free, we may assume that has at least distinct letters. In this case, let , and let be the length prefix of . So , where is a prefix of , and is in the kernel of . Evidently, we have . If does not contain a cut, then , and hence . It follows that . Since every factor of of length less than occurs in , and is undirected -free, we may assume that contains a cut.

Since contains a cut, by the discussion immediately preceding this proof, we can find a factor of such that is a prefix of , the word is in the kernel of , and does not contain a cut. Then we have

Thus, we have

(1)

Note also that if , then we have

(2)

Since every factor of length in contains a cut, we must have . Putting this together with (1), we find that

This is a constant bound on