The undirected repetition threshold

04/22/2019
by   James D. Currie, et al.
The University of Winnipeg
0

For rational 1<r≤ 2, an undirected r-power is a word of the form xyx', where x≠ε, x'∈{x,x^R}, and |xyx'|/|xy|=r. The undirected repetition threshold for k letters, denoted URT(k), is the infimum of the set of all r such that undirected r-powers are avoidable on k letters. We first demonstrate that URT(3)=74. Then we show that URT(k)≥k-1k-2 for all k≥ 4. We conjecture that URT(k)=k-1k-2 for all k≥ 4, and we confirm this conjecture for k∈{4,8,12}.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/12/2020

The undirected repetition threshold and undirected pattern avoidance

For a rational number r such that 1<r≤ 2, an undirected r-power is a wor...
05/21/2020

Undirected Unicast Network Capacity: A Partition Bound

In this paper, we present a new technique to obtain upper bounds on undi...
05/21/2020

On the Partition Bound for Undirected Unicast Network Information Capacity

One of the important unsolved problems in information theory is the conj...
12/03/2021

A Parallel PageRank Algorithm For Undirected Graph

PageRank is a fundamental property of graph and there have been plenty o...
12/12/2019

Desynchronization in Oscillatory Networks Based on Yakubovich Oscillatority

The desynchronization problems in oscillatory networks is considered. A ...
09/19/2017

On Graphs and the Gotsman-Linial Conjecture for d = 2

We give an infinite class of counterexamples to the Gotsman-Linial conje...
07/25/2019

Learning binary undirected graph in low dimensional regime

Given a random sample extracted from a Multivariate Bernoulli Variable (...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A square is a word of the form , where is a nonempty word. An Abelian square is a word of the form , where is an anagram (or permutation) of . The notions of square and Abelian square can be extended to fractional powers in a natural way. Let be a rational number. An (ordinary) -power is a word of the form , where is a nonempty word, and . An Abelian -power is a word of the form , where is a nonempty word, is an anagram of , and .111We use the definition of Abelian -power of Cassaigne and Currie [5]. We note that several distinct definitions exist (see [28, 17], for example).

In general, if is an equivalence relation on words that respects length (i.e., we have whenever ), then an -power up to is a word of the form , where is nonempty, , and . The notion of -power up to generalizes ordinary -powers and Abelian -powers, where the equivalence relations are equality and “is an anagram of”, respectively.

Let be an equivalence relation on words that respects length. For a real number , a word is called -free up to if no factor of is an -power up to for . Moreover, the word is called -free up to if no factor of is an -power up to for . For every integer , we say that -powers up to are -avoidable if there is an infinite word on letters that is -free up to , and -unavoidable otherwise. For every integer , the repetition threshold up to for letters, denoted , is defined as

Since we have only defined -powers for , it follows that or for any particular value of .

It is well-known that squares are -avoidable [1]. Thus, for , we have that is the usual repetition threshold, denoted simply . Dejean [14] proved that , and conjectured that and for all . This conjecture has been confirmed through the work of many authors [13, 26, 25, 14, 3, 23, 12, 22].

It is also known that Abelian squares are -avoidable [21]. Let denote the equivalence relation “is an anagram of”. Thus, for all , we see that is equal to the Abelian repetition threshold (or commutative repetition threshold) for letters, introduced by Cassaigne and Currie [5], and denoted . Relatively less is known about the Abelian repetition threshold. Cassaigne and Currie [5] give (weak) upper bounds on in demonstrating that . Samsonov and Shur [28] conjecture that and for all , and give a lower bound matching this conjecture.222Samsonov and Shur define weak, semi-strong, and strong Abelian -powers for all real numbers . For rational , their definitions of semi-strong Abelian -power and strong Abelian -power are both equivalent to our definition of Abelian -power.

For every word , where the are letters, we let denote the reversal of , defined by . For example, if then . Let be the equivalence relation on words defined by if or . In this article, we focus on determining . We simplify our notation and terminology as follows. We refer to -powers up to as undirected -powers. These come in two types: words of the form are ordinary -powers, while we refer to words of the from as reverse -powers.333We note that words of the form are sometimes referred to as gapped repeats, and that words of the form are sometimes referred to as gapped palindromes. In particular, an ordinary (reverse, respectively) -power satisfying is called an -gapped repeat (-gapped palindrome, respectively). Algorithmic questions concerning the identification and enumeration of -gapped repeats and palindromes in a given word, along with some related questions, have recently received considerable attention; see [20, 16, 19, 6] and the references therein. Gapped repeats and palindromes are important in the context of DNA and RNA structures, and this has been the primary motivation for their study. For example, the English words edited and render are undirected -powers; edited is an ordinary -power, while render is a reverse -power.

We say that a word is undirected -free if it is -free up to . The definition of an undirected -free word is analogous. We let , and refer to this as the undirected repetition threshold for letters.

It is clear that is coarser than and finer than . Thus, for every rational , an -power is an undirected -power, and an undirected -power is an Abelian -power. As a result, we immediately have

for all .

Since only a weak upper bound on is currently known, we provide an alternate upper bound on for large enough . For words and of the same length (possibly infinite) over alphabets and , respectively, the direct product of and denoted is the word on alphabet defined by

A word is called a reversible factor of if both and are factors of .

Theorem 1.

For every , we have .

Proof.

Fix , and let . Evidently, we have and thus . Let be an infinite -free word on letters. We claim that the word on letters is undirected -free, from which the theorem follows. Since the only reversible factors of have length at most , any reverse -power in satisfies , and hence is an ordinary -power as well. Since is ordinary -free, so is . This completes the proof of the claim. ∎

We now describe the layout of the remainder of the article. In Section 2, we discuss related problems in pattern avoidance, and give some implications of our main results in that setting. In Section 3, we show that using a standard morphic constuction. In Section 4, we demonstrate that for all . In Section 5, we use a variation of the encoding introduced by Pansiot [25] to prove that for . In light of our results, we propose the following.

Conjecture 2.

For all , we have .

We briefly place this conjecture in context. We know that for all we conjecture that for all , and Samsonov and Shur [28] conjecture that for all . Let us fix . In [29], Shur proposes splitting all exponents greater than into levels as follows444Shur considers exponents belonging to the “extended rationals”. This set includes all rational numbers and all such numbers with a , where covers , and the inequalities and are equivalent.:

st level nd level rd level

For , Shur provides evidence that the language of -free -ary words and the language of -free -ary words exhibit similar behaviour (e.g., with respect to growth) if and are in the same level, and quite different behaviour otherwise; see [29, 30]. If the conjectured values of and are correct, then the undirected repetition threshold and the Abelian repetition threshold provide further evidence of the distinction between levels.

We now introduce some terminology that will be used in the sequel. Let and be alphabets, and let be a morphism. Using the standard notation for images of sets, we have which we refer to as the set of blocks of . A set of words is called a prefix code if no element of is a prefix of another. If is a prefix code and is a nonempty factor of some element of , a cut of over is a pair such that (i) ; and (ii) for every pair of words with we have . We use vertical bars to denote cuts. For example, over the prefix code the word 11 has cut . The prefix code that we work over will always be the set of blocks of a given morphism, and should be clear from context.

2 Related problems in pattern avoidance

Let be a word over alphabet , where the are letters called variables. In this context, the word is called a pattern. If is an equivalence relation on words, then we say that the word encounters up to if contains a factor of the form , where each word is nonempty and whenever . Otherwise, we say that avoids up to . A pattern is -avoidable up to if there is an infinite word on a -letter alphabet that avoids up to . Otherwise, the pattern is -unavoidable up to . Finally, the pattern is avoidable up to if it is -avoidable for some , and unavoidable up to otherwise.

When is equality, we recover the ordinary notion of pattern avoidance (see [4]). When is (i.e., “is an anagram of”), we recover the notion of Abelian pattern avoidance (see [8, 9, 27], for example). One could also explore pattern avoidance up to , or undirected pattern avoidance. We discuss some initial results in this direction. While there are patterns that are avoidable in the ordinary sense but not in the Abelian sense [8, Lemma 3], every avoidable pattern is in fact avoidable up to , as we show below.

Theorem 3.

Let be a pattern. Then is avoidable in the ordinary sense if and only if is avoidable up to .

Proof.

If is unavoidable in the ordinary sense, then clearly is unavoidable up to If is avoidable in the ordinary sense, then let be an -word avoiding . The direct product avoids up to by an argument similar to the one used in Theorem 1. ∎

Questions concerning the -avoidability of patterns up to appear to be more interesting. The avoidability index of a pattern up to , denoted , is the least positive integer such that is -avoidable up to , or if is unavoidable. In general, for any pattern , we have

The construction of Theorem 3 can be used to show that , though we suspect that this bound is not tight.

The study of the undirected repetition threshold will have immediate implications on avoiding patterns up to . For example, we can easily resolve the avoidability index of unary patterns up to using known results along with a result proven later in this article.

Theorem 4.

Proof.

We prove that in Section 3, from which it follows that . Backtracking by computer, one finds that the longest binary word avoiding in the undirected sense has length , so . Since  [15], we conclude that . Finally, since  [15], we have for all . ∎

We plan to determine the avoidability index of all binary patterns up to in a future work.

Finally, we remark that the study of -avoidability of patterns up to has implications for -avoidability of patterns with reversal (see [7, 10, 11] for definitions and examples). In particular, if pattern is -avoidable up to , then all patterns with reversal that are obtained by swapping any number of letters in with their mirror images are simultaneously -avoidable; that is, there is an infinite word on letters avoiding all such “decorations” of .

3

Dejean [14] demonstrated that , and hence we must have . In order to show that it suffices to find an infinite ternary word that is undirected -free. We provide a morphic construction of such a word. Let be the -uniform morphism defined by

0
1
2

The morphism is similar in structure to the morphism of Dejean [14] whose fixed point avoids ordinary -powers (but not undirected -powers). Note, in particular, that is “symmetric” in the sense of [18].

The following theorem was also verified by one of the anonymous reviewers using the automatic theorem proving software Walnut [24].

Theorem 5.

The word is undirected -free.

Proof.

We first show that has no factors of the form with (which is equivalent to ). By exhaustively checking all factors of length of , we find that has no reversible factors of length greater than . So if has a factor of the form with , then , and in turn . So . Every factor of length at most appears in , so by checking this prefix exhaustively we conclude that has no factors of this form.

So it suffices to show that is (ordinary) -free. Suppose towards a contradiction that has factor with . Let be the smallest number such that a factor of this form appears in . By exhaustive check, we have . First of all, if , then . Every factor of of length at most appears in , so we may assume that . Then contains at least one of the factors 01020, 12101, or By inspection, each one of these factors determines a cut in (over the prefix code ), say , where is a possibly empty proper suffix of a block of , and is a possibly empty proper prefix of a block of . If is properly contained in a single block, then

In this case, one verifies that the preimage of contains a square, which contradicts the minimality of . Otherwise, if is not properly contained in a single block of , then , where is a possibly empty proper suffix of a block, and is a possibly empty proper prefix of a block. Then

which appears internally as

The preimage of this factor is where , , , and Then or equivalently which contradicts the minimality of . ∎

Thus, we conclude that . We will see in the next section that is strictly greater than for every .

4 A lower bound on for

Here, we prove that for .

Theorem 6.

If , then , and the longest -ary word that is undirected -free has length .

Proof.

For , the statement is checked by a standard backtracking algorithm, which we performed by computer. We now provide a general backtracking argument for all .

Fix , and suppose that is a -ary word of length that is undirected -free. It follows that at least letters must appear between any two repeated occurrences of the same letter in , so that any length factor of must contain distinct letters. So we may assume that has prefix . Further, given any prefix of of length at least , there are only two possibilities for the next letter in , as it must be distinct from the distinct letters preceding it. These possibilities are enumerated in the tree of Figure 1.

k

2

3

1

1

3

4

2

2

1

k

3

4

2

2

4

5

3

3

2
Figure 1: The tree of undirected -power free words on letters.

We now explain why each word corresponding to a leaf of the tree contains an undirected -power for some . We examine the leaves from top to bottom, and use the fact that when .

  • The factor is an ordinary -power.

  • The factor is an ordinary -power.

  • The factor is a reverse -power.

  • The factor is an ordinary -power.

  • The factor is an ordinary -power.

  • The factor is an ordinary -power.

  • The factor is an ordinary -power.

  • The factor is a reverse -power.

  • The factor is an ordinary -power.

  • The factor is a reverse -power.

  • The factor is an ordinary -power. ∎

Conjecture 2 proposes that the value of matches the lower bound of Theorem 6 for all . In the next section, we confirm Conjecture 2 for several values of .

5 for

First we explain why we rely on a different technique than in Section 3. Fix , and let . A morphism is called -free (-free, respectively) if it maps every -free (-free, respectively) word in to an -free (-free, respectively) word in . The morphism is called growing if for all . Brandenburg [2] demonstrated that for every , there is no growing -free morphism from to . By a minor modification of his proof, one can show that there is no growing -free morphism from to . While this does not entirely rule out the possibility that there is a morphism from to whose fixed point is -free, it suggests that different techniques may be required. Our technique relies on an encoding similar to the one introduced by Pansiot [25] in showing that . Pansiot’s encoding was later used in all subsequent work on Dejean’s Conjecture.

5.1 A ternary encoding

We first describe an alternate definition of ordinary -powers which will be useful in this section. A word , where the are letters, is periodic if for some positive integer , we have for all . In this case, the integer is called a period of . The exponent of , denoted is the ratio between its length and its minimal period. If , then is an -power.555If , then is an -power as we have defined it in Section 1. If , then we take this as the definition of an (ordinary) -power. For example, the English word alfalfa has minimal period and exponent so it is a -power. We can write any -power as , where and is a prefix of . In this case, we say that is the excess of the -power .

Suppose that is an undirected -free word that contains at least distinct letters. Write with . Certainly, every length factor of contains distinct letters, and it is easily checked that every length factor of contains at least distinct letters.

Now let be any word containing at least distinct letters and satisfying these two properties:

  • Every length factor of contains distinct letters; and

  • Every length factor of contains at least distinct letters.

Let be the shortest prefix of containing distinct letters. We see immediately that has length or . Write , where with . Define and for all . For all , the prefix determines a permutation

of the letters of , which ranks the letters of by the index of their final appearance in . In other words, the word is the length suffix of , and of the two letters in , the letter is the one that appears last in . Note that the final letter may not even appear in . For example, on the prefix 123416 gives rise to the permutation

Since every factor of length in contains distinct letters, for any , the letter must belong to the set This allows us to encode the word over a ternary alphabet, as described explicitly below.

For , define , where for all , we have

For example, on , for the word the shortest prefix containing distinct letters is , and has encoding Given the shortest prefix of containing distinct letters, and the encoding , we can recover . Moreover, if has period , then so does . The exponent of corresponds to an exponent of .

Let denote the symmetric group on with left multiplication. Define a morphism by

One proves by induction that . It follows that if has period , and contains at least distinct letters, then the length prefix of lies in the kernel of . In this case, the word is called a kernel repetition. For example, over , the word

has period , and excess 12324. Hence, the encoding is a kernel repetition; one verifies that .

Suppose that is even. Then and

are odd, while

is even. It follows that is even, and hence the subgroup of generated by and is a subgroup of the alternating group . This simple observation leads to the following important lemma, which will be used to bound the length of reversible factors in the words we construct.

Lemma 7.

Let satisfy . Let be a word with prefix and encoding . Suppose that is a factor of , where are distinct letters. Then is not a factor of .

Proof.

Suppose towards a contradiction that and are both factors of . Assume without loss of generality that appears before in . Then contains a factor with prefix and suffix . Consider the encoding which is a factor of .

Immediately after reading , the ranking of the letters in is

where is the unique letter in . Immediately after reading , the ranking of the letters in is

Evidently, we have

Since we observe that is an odd permutation. We claim that does not begin in 1 or end in 3, so that . But and are both even, which contradicts the fact that is odd.

The fact that does not end in 3 follows immediately from the fact that . It remains to show that does not begin with 1. If is a prefix of , then begins in 3 or 2, so we may assume that with . Then has prefix . If began in 1, then would necessarily end in , and this is impossible since . This completes the proof of the claim, and the lemma. ∎

5.2 Constructions

Define morphisms as follows:

Define by

A key property of each of the morphisms and is that the images of 1 and 2 end in different letters.

Theorem 8.

Fix and let . Let be the word over with prefix and encoding . Then is undirected -free.

The remainder of this section is devoted to proving Theorem 8. Essentially, we adapt the technique first used by Moulin-Ollagnier [23]. A simplified version of Moulin-Ollagnier’s technique, which we follow fairly closely, is exhibited by Currie and Rampersad [13]. For the remainder of this section, we use notation as in Theorem 8. We let , i.e., we say that is -uniform.

We first discuss kernel repetitions appearing in . Let factor of be a kernel repetition with period ; say . Let be the maximal period extension of the occurrence of . Write and so that . Write where . By the periodicity of , the factor is conjugate to , and hence is in the kernel of . Write where is a proper suffix of or , and is a prefix of or . Analogously, write . Since and end in different letters, it follows from the maximality of that . In particular, the word begins in 3. It follows that begins in 3, and thus we may assume that . Finally, by the maximality of , we have , the longest common prefix of and . Altogether, we can write

where and is a prefix of . We see that and .

Let be the composite morphism . Evidently, we have

Since was in the kernel of , we see that

i.e., the word is in the kernel of .

Now set and . By the maximality of the repetition must be a maximal repetition with period (i.e., it cannot be extended). If has a cut, then it follows by arguments similar to those used above that , where is a prefix of and is the longest common prefix of and . One checks that there is an element such that

for every , i.e., the morphism satisfies the “algebraic property” described by Moulin-Ollagnier [23]. It follows that is in the kernel of . We can repeat this process until we reach a repetition whose excess has no cut. Recalling that is an -uniform morphism, we have

and

Note that if , while if . Thus, we have

It follows that .

Proof of Theorem 8.

We first show that contains no reverse -power with . Since 33 is not a factor of , every factor of length in contains a factor of the form , where are distinct letters. Thus, by Lemma 7, if is a factor of with , then . In turn, we have . Therefore, we conclude by a finite check that contains no reverse -power with .

It remains to show that is ordinary -free. Suppose to the contrary that is a factor of such that is a prefix of and . We may assume that is maximal with respect to having period . If has less than distinct letters, then . In turn, we have . By a finite check, the word has no such factors.

So we may assume that has at least distinct letters. Let , and let be the length prefix of . So , where is a prefix of . Hence is a kernel repetition, i.e., the word is in the kernel of . By the maximality of , we see that begins in 3. Hence, the length prefix of contains distinct letters, and . We can find a factor of as described above, such that is a prefix of , the word is in the kernel of , and does not contain a cut. Now